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SECTION  1 


INTRODUCTION 

This  is  the  final  report  for  a  three-year  research  program  on 
imaging  with  thinned  sparse  arrays  of  antennas  or  apertures.  In  the 
last  year  we  have  studied  the  foundations  of  the  maximum  entropy  (ME) 
estimation  method  and  its  application  to  the  restoration  and  super¬ 
resolution  of  images  that  have  been  degraded  by  undersampling  and  by 
the  finite  aperture  of  the  sparse  array.  Extensive  computer  simulations 
were  performed  on  one-  and  two-dimensional  image  signals  to  test  the 
theoretical  formulations  of  ME.  This  method  of  approach  gives  the 
results  a  certain  generality  in  wavelength  and  physical  scale  that 
would  have  been  extremely  time  consuming  to  obtain  using  the  data 
from  direct  physical  experimentation. 

The  simulations  of  imaging  with  sparse  arrays  provided  important 
and  surprising  results.  We  summarize  here  the  important  points  of 
our  previous  two-year  study  of  these  arrays.  Further  details  can  be 
found  in  the  annual  report  for  this  contract  (June  1979) .  Very  high 
resolution  imaging  systems,  regardless  of  wavelength,  require  large 
apertures  with  high  resolution,  low  side  lobe  level,  and  high  gain. 

These  apertures  are  often  considered  to  be  technically  and  economically 
infeasible.  We  investigated  the  efficacy  of  using  very  sparse  arrays 
of  randomly  placed  small  antenna  elements  in  imaging  systems.  The 
properties  of  these  arrays  are  established  in  antenna  theory.  What 
we  have  done  is  to  make  the  first  demonstration  of  their  application 
in  imaging  systems.  Computer  simulation  studies  were  made  for  coher¬ 
ent,  partially  coherent,  and  incoherent  imaging.  These  were  done 
using  various  conditions,  including  variable  signal-to-noise  ratio 
and  phase  aberration,  but  with  monochromatic  illumination.  We  showed 
that,  for  incoherent  imaging,  a  large  degree  of  thinning  down  to  6% 
of  the  number  of  full  array  elements  (4096)  produces  image  quality 
comparable  to  that  produced  by  the  full  array.  For  broad-band 
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polychromatic  incoherent  imaging,  even  larger  degrees  oi  thinning, 
down  to  3%,  were  similarly  effective.  The  fracional  degree  of  thinning 
allowable  for  incoherent  imaging  is  in  principle,  inversely  propor¬ 
tional  to  the  square  root  of  the  number  of  elements  in  the  full  array, 
which  is  equivalent  to  the  number  of  elements  resolved  in  the  full 
array.  The  arrays,  which  are  easy  to  design,  degrade  gracefully 
when  large  fractions  of  the  elements  are  removed.  For  coherent 
imaging  applications,  random  arrays  do  not  offer  significant  advantages 
over  full  arrays. 

The  body  of  this  report  describes  the  results  of  the  last 
contract  year's  work  on  the  ME  estimation  method.  Both  the  theory  and 
application  are  described  along  with  the  results  of  computer  simula¬ 
tions  on  one-  and  two-dimensional  images.  The  meaning  and  nature  of 
the  ME  method  is  discussed  in  Section  2,  where  the  meaning  of  entropy 
in  this  problem  is  clarified.  A  generalization  is  also  presented 
that  reconciles  the  two  contending  forms  of  entropy  seen  in  the  ME 
literature  by  showing  them  to  be  special  limiting  cases  of  a  more 
general  entropy,  based  on  quantum  statistics.  The  actual  physical 
nature  of  the  source  and  measurement  apparatus  determines  the  appro¬ 
priate  limiting  form  of  entropy.  The  commonly  used  information 
theoretic  entropy  is  shown  to  be  one  such  limiting  form. 

An  analogy  is  made  between  the  ME  problem  and  the  free-energy 
minimization  problem,  familiar  in  statistical  mechanics.  This 
analogy  allows  fluctuation  noise  and  signal-to-noise  concepts  to  be 
introduced  naturally  into  the  theory  and  practice  of  the  method  via 
an  effective  noise  temperature,  if  the  noise  or  signal-to-noise  is 
known  a  priori  by  independent  measurement  from  an  ensemble  of 
measurements . 

There  is  a  closed-form  solution  to  the  ME  problem  in  one  dimension 
for  the  case  of  Gaussian  statistics.  In  that  situation,  matrix 
inversion  for  image  signals  of  small  extent  or  recursion  for  larger 
ones  can  readily  be  used  to  solve  the  problem.  No  closed-form 
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solution  exists  for  any  other  case,  and  iterative  algorithms  must  be 
employed  in  digital  computations.  We  studied  two  methods:  the  natural- 
iteration  method,  which  guarantees  convergence,  and  the  much  faster 
Newton-Ralphson  method  which  does  not.  Examples  of  one-  and  two- 
dimensional  and  computer  simulations  of  restoration  and  superresolution 
are  given  in  Section  4.  No  approximation  was  needed  for  nonseparable 
two-dimensional  point-spread  functions. 

A  question  rarely  asked  in  ME  estimation  work  is  what  degree 
of  confidence  should  be  ascribed  to  the  estimates.  We  develop  in 
Section  5  a  general  method  for  the  calculation  of  the  degree  of  confi¬ 
dence  in  the  ME  estimates,  suitable  for  two  dimensions  and  not 
restricted  to  Gaussian,  independent  statistics.  This  method  is 
applied  to  a  set  of  ME  estimates  consisting  of  estimated  objects, 
all  differing  in  the  degree  of  superresolution  and  differing  further 
in  the  degree  of  noise  assumed.  The  multidimensional  (49  dimensional 
in  our  example)  nature  of  the  results  makes  a  two-dimensional  graphical 
representation  of  the  results  difficult,  but  we  have  chosen  to 
attempt  it  in  order  to  supplement  a  purely  algebraic  statement. 

As  a  corollary  of  our  study  and  understanding  of  the  entropy 
in  the  ME  method,  we  noticed  that  many  authors,  while  recognizing 
the  importance  of  the  Poisson  statistical  regime  for  imaging  and 
communications,  have  incorrectly  evaluated  the  probabilities  and 
channel  capacities  for  the  Poisson  distribution.  They  incorrectly 
used  the  wrong  entropy  or  information  expression  for  this  case. 
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SECTION  2 


MAXIMUM  ENTROPY  ESTIMATION:  THE  GENERAL  POINT  OF  VIEW 

Maximum  Entropy  has  been  used  in  various  forms  and  for  various 
ends  in  recent  years.  Applications  ranging,  for  example,  from  radar 
filter  formation  to  beam  forming,  from  seismic  spectral  estimation, 
to  astronomical  imaging  and  even  to  economic  forecasting,  have  all 
been  enriched  by  this  general  technique.  The  particular  techniques 
and  philosophical  points  of  view  vary  widely  among  the  several 
disciplines,  each  working  independently  and  publishing  in  specialized 
journals . 

The  interpretation  we  present,  which  we  believe  to  be  the 
most  general  and  which  includes  others  as  special  more  limited  cases, 
is  based  on  two  considerations.  Any  image,  measured  as  signal, 
pattern,  or  spectrum,  whatever  the  variables  represent,  is  a 
necessarily  degraded  version  of  the  true  object.  It  is  necessarily 
degraded  because  a  real  measurement  system  has  limited  bandwidth 
or  finite  point  impulse  response  function,  because  the  time  sample  is 
finite,  and  possibly  because  the  measurements  were  undersampled. 
Furthermore,  noise  cannot  be  ignored. 

Many  different  possible  object  patterns  (whatever  the  object 
variables  represent)  all  differ  in  detail,  but  can  produce  the  same 
measured  image  pattern.  This  ambiguity  can  be  resolved  by  applying 
the  ME  method.  In  our  interpretation,  the  second  consideration  is 
that  a  probability  is  assigned  to  each  possible  pattern  and  the 
most  probable  pattern  is  chosen  as  the  estimated  or  restored  object. 
Patterns  are  assigned  probabilities  based  on  the  physics  and  statistics 
of  the  immediate  problem.  The  entropy  is  understood  to  mean  the 
logarithm  of  the  probability.  So  to  find  a  maximum  of  the  entropy 
is  to  find  a  maximum  of  the  probability. 

From  this  point  of  view  ME  is  a  method  to  estimate  the  true 
object  (or  its  transform)  by  maximizing  its  entropy  subject  to  the 
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measured  image  data  constraints.  In  the  next  section  we  develop  this 
idea  in  an  analogy  between  ME  and  the  well  known,  and  much  practiced 
statistical  mechanical  principle  of  the  minimization  of  the  free  energy 
and  derive  some  useful  benefits  from  the  analogy. 

Here,  some  comments  are  in  order  about  a  much  repeated  "principle 
of  maximum  entropy"  put  forward  by  Jaynes  which  is  often  used  to  justify 
the  ME  method.  In  the  particular  form  it  is  often  quoted  in  ME, 
exploiting  the  relationship  between  information  and  entropy,  it 
states  that  the  ME  estimate  is  the  least  prejudiced  one  bringing  no 
additional  information  from  tacit  structures  and  assumptions.  Any 
such  implicit  features  would  decrease  the  entropy  from  its  maximum. 

We  do  not  need  to  employ  a  new  principle  to  formulate  the  ME  problem, 
we  maximize  the  probability  and,  following  Boltzmann,  we  identify  the 
logarithm  of  probability  with  the  entropy. 

An  information  theoretic  point  of  view  can  enrich  and  sharpen 
our  understanding  of  the  relationship  between  the  observer  and  the 
observed,  the  knower  and  the  known,  but  it  has  sometimes  been  subject 
to  misuse  and  misunderstanding.  The  information  theoretic  entropy  of 
Shannon,  -f  log  f,  is  often  employed  inappropriately.  This  form  is 
appropriate  only  when  there  is  no  underlying  a  priori  probability 
distribution  departing  from  an  equal  a  priori  weighting.  For  example, 
it  would  be  appropriate  for  the  ubiquitous  gaussian  case  but  not  so 
for  the  poisson  case. 

These  considerations  regarding  the  entropy  expression  have  been 
developed  at  length  in  our  paper  "Maximum  Entropy  Image  Restoration: 

I.  The  Entropy  Expression,"  published  in  the  Journal  of  the  Optical 
Society  of  America  and  reprinted  in  Appendix  A.  The  interested 
reader  is  referred  there  for  a  detailed  exposition;  only  a  summary  is 
given  here. 

There  are  two  schools  employing  two  different  forms  of  entropy 
in  the  ME  problems  they  are  solving.  We  call  them  the  "log  B"  and 
"-B  log  B"  schools,  where  B  is  the  local  or  instantaneous  brightness. 
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power,  intensity,  or  their  spectral  counterparts.  We  derive  these 
two  expression:,  as  limiting  cases  of  a  more  general  entropy  forumula 
based  on  the  underlying  properties  and  statistics  of  the  physical 
source  and  of  the  measurement  process.  In  the  case  of  photon  or 
electromagnetic  signaling  or  imaging,  the  Bose-Einstein  statistics 
and  in  the  case  of  electrons,  the  Kermi-Dirac  statistics  are 
employed.  The  n  quantum  mechanical  particles  comprising  the  bright¬ 
ness  are  distributed  over  z  degrees  of  freedom  as  calcualted  by 
these  statistics.  The  number  of  degree.-,  oi  freedom  is  a  function 
of  both  the  source  and  the  measurement  and  estimation  process.  It 
can  be  understood  in  phase  space  in  terms  of  the  ratio  of  the  sizes 
of  the  detection  volume  to  the  coherence  volume  of  the  particles. 

The  number  of  degrees  of  freedom  is  an  extension  of  the  familiar 
idea  of  the  time-bandwidth  product  to  include  the  conjugate  variables 
of  space-reciprocal  space  as  well.  The  entropy  to  be  maximized  is 
the  logarithm  of  the  probability  as  given  by  the  physical  statistics 
of  the  problem,  following  the  original  meaning  of  entropy  as  given  by 
Boltzmann  and  Planck.  The  entropies  log  B  and  -B  log  B  result  in  the 
limit  n>>z  >>1  and  n<<  z,  respectively.  When  n  is  interpreted  as  an 
average  n  over  an  ensemble,  we  find  in  addition  the  log  B  expression 
when  n>>z  =  1.  The  Burg  form  used  in  spectral  estimation,  for 
example,  is  log  B.  The  distribution  function  for  this  case  is  the  exp< 
nential  in  intensity,  or  Gaussian  in  complex  amplitude.  Shannon's 
entropy  expression  is  shown  to  be,  in  the  same  way,  a  special  case 
of  the  more  general  result  and  is  appropriate  for  his  special  interest 
namely,  the  z  =  1  Gaussian  complex  amplitude  case  of  equal  a  priori 
probabilities. 

We  show  in  Figure  1,  by  way  of  illustrating  the  n/z  concept  with 
a  familiar  example,  a  plot  of  the  wavelength-temperature  dependence 
of  a  blackbody  for  given  average  number  of  photons  per  mode  (or  per 
degree  of  freedom)  n/z. 
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Figure  1.  Wavelength-temperature  dependence  of  a  blackbody 
for  given  average  numbers  of  photons  per  mode 
(or  per  degree  of  freedom),  n/z. 


The  MK  method,  in  summary,  may  be  viewed  as  a  nonlinear  inversion 
or  deconvolution  method  for  estimation  from  partial  and  noisy  data. 

It  can  be  used  to  extrapolate  or  interpolate  in  the  spaces  of  the 
measured  variables  or  their  conjugates,  as  in  Fourier  transform 
space.  Since  it  can  achieve  an  inversion  beyond  the  algebraically 
allowable  spatial,  temporal,  or  spectral  limit  of  the  measured  data, 
it  can  be  said  to  produce  a  superresolution.  In  the  next  sections 
we  describe  our  method  of  solution,  describe  a  useful  thermodynamic 
analogy,  and  give  some  examples  in  cine  and  two  dimensions.  Finally, 
we  address  the  question  of  the  degree  of  confidence  in  the  estimates. 
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SECTION  3 

THE  ME  METHOD  AND  ITS  INTERPRETATION 

In  this  section  we  describe  oar  particular  formulation  of  the 
ME  method  and  present  a  useful  thermodynamic  interpretation  in  terms 
of  the  free  energy.  This  interpretation  permits  us  to  deal  with  the 
problem  of  fluctuations  or  noise  in  a  natural  way.  The  Burg  method 
and  the  relationship  of  ME  to  the  Bayesian  point  of  view  is 
discussed . 

A.  FORMULATION  OF  THE  ME  PROBLEM 

We  begin  with  a  set  of  measurement  data  in  one  or  more  dimensions, 
a  known  noise  or  s ignal-to-noise  ratio  (S/N)  ,  a  known  instrumental 
and  transmission  path  point-spread-function  for  measured  spatial 
variables,  and  impulse  response  function  for  measured  temporal 
variables.  I f  we  are  measuring  or  estimating  spectra  in  the 
reciprocal  or  conjugate  space  of  spatial  or  temporal  frequency,  we 
must  then  know  the  detailed  behavior  of  the  band-limited  transfer 
function  or  time-limited  sampling  function  of  the  instruments.  The 
goal  of  the  method  will  be  to  estimate  a  finer  resolution  version  of 
the  original  measurement  datum  (or  its  spectra),  or  equivalently,  an 
extrapolated  version  of  the  original  measurement  datum  (or  its  spectra. 
An  example  from  image  estimation  will  help  to  illustrate  these  ideas. 

Suppose  an  unknown  object  {Ob}  convoluted  ©  wi th  a  known 
instrumental  point  spread  function  (PSF)  is  observed  as  an  image 
signal  { I }  . 

We  may  write 


10b  t  ®  PSF  =  {  I  1 


(1) 
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Naturally,  there  are  fewer  independent  variables  or  resolution 
elements  (space-spatial  frequency  product  elements)  in  the  image 
than  there  are  potentially  in  the  object.  There  are  many  possible 
objects,  that  could  produce  the  same  imsge.  There  is  a  many-to-one 
mapping  of  the  variables  that  might  describe  the  object  to  the 
variables  that  describe  the  image  signal.  One  ME  problem  is  to 
invert  Eq.  1  to  get  an  estimate  of  the  object  {Ob}  with  some  desired 
degree  of  resolution  or  superresolution.  The  carrot  *  symbol  denotes 
estimates . 

Inversion  is  possible  using  linear  methods,  but  only  up  to  the 
band  limit,  resolution  limit,  or,  the  uncertainty  limit.  The 
problem  is  that  of  inverting  singular  matrices  or  dividing  by 
zeros,  or  near  zeros.  All  linear  methods  are  basically  equivalent  in 
sharing  this  difficulty. 

The  classical  analysis  by  Slepian  and  Pollock  showed  that  at 
and  beyond  the  resolution  or  uncertainty  limit  the  eigenvalues  fall 
rapidly  to  neglible  near-zero  values.  Thus,  any  finite  noise,  whether 
physical  noise  in  the  signal  or  numerical  noise  in  the  digital  computer, 
renders  superresolution  impossible  by  linear  inversion  schemes.  As 
a  nonlinear  estimate  method,  ME  diminishes  these  difficulties.  Like 
other  nonlinear  methods,  such  as  the  iterative  method  of  constrained 
positivity,  ME  can  achieve  superresolution  estimates  in  the  presence 
of  noise.  (We  incorporate  a  positivity  condition  in  our  ME  formalism 
in  a  natural  non  ad  hoc  way.) 

Only  the  one-dimensional  ME  problem  for  the  case  of  Gaussian  statis¬ 
tics  has  a  closed-form  solution  (discovered  by  Burg).  This  problem  can 
be  solved  by  direct  matrix  inversion,  although  recursion  schemes  are 
usually  employed  for  speed  and  economy  of  calculation.  In  higher  dimen¬ 
sional  problems,  such  as  two-dimensional  pictorial  imaging  or  synthetic 
aperture  radar,  no  closed-form  solution  exists.  Furthermore,  for  anv 
statistics  other  than  Gaussian,  no  closed-form  solution  exists.  For 
example,  for  Poisson  statistics,  the  problem  must  be  solved  numerically. 
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Our  formulation  of  the  ME  problem  is  to  assign  a  probability  to  each 
possible  object,  based  on  the  physical  statistics  of  the  problem 
at  hand,  and  choose  the  most  probable  object  as  the  estimate.  In 
practice,  we  do  not  enumerate  the  probability  for  each  object  that 
agrees  with  the  constraints,  but  determine  the  most  probable  object 
by  the  methods  of  analysis.  But,  in  principle,  it  could  be  done 
that  way. 

According  to  Boltzmann's  identification  of  entropy  S  and 
probability  P  (see  Appendix  A)  P  being  the  normalized  number  of  ways 
something  can  occur,  the  degeneracy 

S  =  log  P  .  (2) 

We  see  that  seeking  the  maximum  probability  object  estimate 
is  equivalent  to  seeking  the  maximum  constrained  entropy : 

_  +S  constrained 

P  =  e  .  (3) 

We  write  the  constraint  in  the  form  of  the  squared  difference  between 
the  given  measured  data  and  the  estimate  to  be  found.  By  subtraction, 
the  constraint  compares  the  calculated  estimate  with  the  original 
measured  data.  Before  comparison,  the  two  quantities  must  be  in,  or 
be  transformed  into,  the  same  space  and  compared  on  the  same  support 
elements.  For  example,  we  may  compare  the  superresolved  object 
estimated  on  a  fine  mesh  with  the  coarser  mesh  measured  image  by 
convoluting  the  estimate  with  the  known  point-spread  function  of 
the  measurement  technique.  This  procedure  reduces  the  number  of 
independent  coordinates  in  the  estimate  and  places  i t  on  the  same 
coarse  mesh  of  the  image  for  comparison.  Thus,  the  estimate  object 
is  degraded  by  the  same  measurement  techniques  that  produced  the  actual 
image.  This  derived  "image"  is  compared  with  the  actual  image  in  the 
constraint  term.  Continuing  our  example,  we  write 
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(4) 


E  =  2  ({1}  -  2pSF.  .  ®  {Ob})2 
i  k  1,K 

In  the  same  way,  we  can  constrain  the  estimated  object  to  agree  with 
an  image  measured  in  Fourier  transform  space.  This  situation  would 
arise,  for  example,  in  the  detection  of  the  partial  coherence  visi¬ 
bility  function  as  is  done  in  radio  telescopy.  Here  we  would  compare 
the  transform  of  the  estimated  object,  degraded  by  the  transfer  or 
filter  function  (the  modulation  transfer  function  (MTF))  of  the 
measurement  technique,  with  the  actual  measured  image  transform. 

Again,  the  estimated  object  is  degraded  in  the  same  fashion  as  the 
actual  image  was.  We  would  then  have  the  constraint 

E  =  2  ({I,  )  “  2  [Fourier  Transform  (i,k)  {Ob.}]  MTF  )2  (5) 

k.  '  k  i  ik 

where  {1}  and  {Ob}  are  in  the  reciprocal  spaces  of  spatial  frequency 
(f  )  and  space  (x) ,  respectively.  Other  permutations  of  the  trans¬ 
form  relation  are  also  possible  in  the  constraint  term.  This  constraint 
term  may  be  viewed  as  an  error  term,  or  a  noise  or  error-tolerant 
term.  Wernicke  and  D'Addaria,1  utilized  it  in  their  formulation  of 
the  ME  problem  with  an  arbitrarily  chosen  multiplier  that  was  not 
clearly  interpreted.  The  constrained  entropy  to  be  maximized  may  be 
written  as 


S({Ob})  =  S ( { Ob } )  -  0E({Ob,l}  )  ,  (6) 

constrained 

where  B  is  a  Lagrange  multiplier  that  is  chosen  so  that  the  constant  of 
Eq,  (4)  is  satisfied.  The  multiplier  8  determines  the  weight,  or 
Importance,  of  the  constraint.  In  general  we  can  make  8  dependent  on 
i  and  multiply  8^  inside  of  in  Eq.  (4);  however,  for  simplicity  we 
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use  a  constant  8^  =  6.  The  constraint  terra  E  may  be  viewes  as  an  energy 
term  since  it  is  a  quadratic  form;  this  is  in  analogy  to  physical  systems 
which  often  have  quadratic  potential  and  kinetic  energies.  We  chose 
this  quadratic  form  to  emphasize  this  analogy  for  a  useful  interpreta¬ 
tion  of  the  multiplier,  and  also  for  technical  and  mathematical  reasons. 
Because  we  want  the  energy  to  become  small  in  our  extremum  problem, 
any  even  function,  such  as  the  absolute  value  or  En ,  where  n  is  an 
integer,  would  do.  But  the  quadratic  form  makes  an  algebraic  and 
algorithmic  simplification. 

Maximizing  the  constrained  entropy  is  the  same  as  minimizing  its 
negative,  and  we  minimize  a  function  which  we  call  $F: 

UF  »  PE  -  S  .  (7) 

Equation  3  for  the  probability  of  the  object  may  now  be  written 


P  =  e“0F  .  (8) 

Since  maximizing  the  probability  is  the  same  as  maximizing  its 
monotonic  mapping  by  the  logarithm,  we  minimize  BF  (Eq.  7) 
directly : 

MIN  6F{0b}  =  MIN ( 6E-3)  =  MIN  [B ( { I }  -  PSF  ®  {Ob})2  -  s].  (9) 
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To  illustrate  more  explicitly,  we  can  choose  an  explicit  form  for 

the  entropy  such  as  S  =  +  log  {Ob}  (suitable  for  Gaussian  amplitude 

statistics),  or  S  =  -{Ob}log{Ob} (suitable  for  Poisson  statistics  as 

we  show  in  Appendix  B) .  The  convolutional  PSF  might  be  Gaussian  or 
.  2 

(S,lP.  -)  ,  depending  on  the  details  of  the  measurement  conditions, 
x  2  2 

(We  choose  the  (sin  x)/x  form  in  our  later  examples.) 


25 


To  minimize  Eq.  (2),  we  set.  the  partial  derivatives  to  zero, 


3F 

3  (Ob) 


0 


(10) 


and  solve  for  {Ob}. 

These  equations  can  be  solved  iteratively  by  a  variety  of  techniques. 

2 

We  have  used  a  "natural  iteration  technique"  developed  by  Kikuchi , 

which  guarantees  convergence  irrespective  of  the  initial  guess,  and 

3 

the  Newtop-Raphson  technique  . 

B.  A  THERMODYNAMIC  ANALOGY 

When  we  examine  Eq.  7,  we  note  that  it  has  the  form  of  the 
function  that  is  minimized  in  the  canonical  ensemble  treatment  of 
statistical  mechanics.  Using  this  analogy,  we  can  go  one  step 
further  and  call  F  (defined  in  Eq.  7)  the  free  energy.  In  our 
interpretation,  we  identify  the  constraint  energy,  Eq  4  or  5, 
with  the  internal  energy  of  the  system,  and  8  with  1/T,  the 
reciprocal  temperature.  Therefore,  in  our  formulation,  of  the  ME 
problem  should  really  be  called  a  Minimum-Free-Energy  (MFE)  problem. 
One  may  call  the  entire  expression,  Eq  7,  entropy  or  free  energy, 
depending  on  whether  one  views  the  system  as  closed  or  open,  and 
whether  the  problem  is  viewed  from  the  microcannonical  or  cannonical 
point  of  view.  In  any  case,  the  same  function  is  to  be  minimized. 

The  important  point  is  that  in  either  case  the  multiplier  (5  can  be 
associated  with  the  concept  of  temperature  and  therefore  fluctuation 
or  noise  as  we  shall  see  below. 

This  analogy  immediately  indicates  what  value  B  should  take 
to  satisfy  the  constraints.  From  statistical  mechanics  we  know 
that  the  system  takes  the  state  of  the  lowest  energy  at  T  =  0. 

This  means  that  the  value  of  8  that  satisfies  the  constraint  is 


26 


r 


1 


ini  inity.  When  r-  =  0  (i.e.,  T ■■*'")  ,  the  entropy  predominates  in  Fq  7; 
thus  the  object  viables  correspond  to  those  values  that  maximize  S 
without  the  constraints,  thereby  producing  a  flat  featureless 
distribution  for  the  object.  As  i‘  increases,  the  F.  term  increases 
its  contribution  and  is  not  at  its  lowest  zero  value. 

Analogy  with  the  statistical  mechanics  enables  us  to  use  the 
re  1  at i on 


JL 

i»T 


(il) 


Ibis  relation  can  he  used  as  a  check  of  numerical  computation, 
liquation  11  can  be  used  as  follows.  If  we  can  expand  F  near  T  =  0 
( .•=  '")  as 


F  =  F  -  TF. 
o  1 


t2f„ 


(12) 


then 


S 


hF 

•IT 


+  2TF  +  .  .  . 


(H) 


and 


2 

F  =  F  +  TS  =  F  +  T  F_+  ....  (14) 

o  2 

Because  we  know  F  =  0  at  T  =  0,  Fq .  14  leads  to  F  =  0. 

Equation  13  shows  that  F  is  the  residual  entropy,  i.e.,  the  entropy 
value  at  T  =  0.  The  behavior  of  F  and  S  near  T  =  0  is  shown  in 
Figure  2. 
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Figure  2.  Behavior  of  free  energy  and  entropy  at  low  temperatures. 

(a)  Normal  behavior;  (b)  Apparent  behavior  in  our  form 
ulation  due  to  suppression  of  a  large  constant  positive 
entropy  term. 


For  the  region  of  6  we  used  in  one  of  our  examples,  the 
expansion  (Eq.  12)  does  not  hold  exactly,  and  the  relation  was 


F  = 


c 


with  a  small  a.  In  this  region,  for  example,  it  was  numerically 
verified  that 


(16) 


S  =  (14tt)TUF1 


and 


E  = 


hold  with  a  =  0.05.  The  relations  in  Eq  16  are  derived  from  the 
thermodynamic  relations,  Eq.  11  and  14. 

C.  FLUCTUATIONS  OR  NOISE 

Our  thermodynamic  analogy  gives  a  further  insight  into  the 
problem  of  the  meaning  of  the  energy  constraint  term  and  its 
multiplyer  6.  Wernecke  and  D'Addario,1  noted  that  if  the  measured 
data  values  in  the  constraint  are  independent  zero  mean,  random 
variables  with  known  a  priori  variances  a*,  the  constraint  E  could 
be  written 


E  /Estimated.  -  Measured. 

- 1 - L 

i  \  oi 

the  equality  holding  provided  M  is  large.  The  computed  trial 

solutions  for  various  multiplyers  3,  to  find  the  one  that  permits 

Eq.  17  to  be  satisfied  with  "sufficient"  accuracy.  They  note  that 

it  might  not  be  possible  to  satisfy  Eq .  17  no  matter  how  large  6  if 

2 

the  measurements  are  too  noisy  or  the  variances  {a^  are  assumed  to 
be  unrealistically  small.  They  give  an  estimate  for  a  trial  guess 
for  3  from  the  case  of  a  single  measurement  of  the  total  object 
intensity  m  by  setting  3F/90b  =  0  (Eq.  10),  and  setting  the  difference 
between  measured  and  estimated  values  equal  to  o^. 


=  M 


(17) 
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(18) 


6  = 


2/1  + 


which  varies  from  zero  to  a  maximum  of  one-half.  They  increase  0  to 

satisfy  "increasing  accuracy"  in  Eq.  17,  but  no  further  rule  or 

interpretation  is  given. 

4 

Gull  and  Daniell,  seemingly  unaware  or  Wernecke  and  D'Addario's 

prior  work\  similarly  assume  the  data  to  have  Gaussian  errors  and 

2 

note  that  then  the  term  E  (Eq.  17)  is  a  x  distributed  statistic  and 
that  the  expected  value  of  x2  is  equal  to  the  number  of  data  values  M. 
They  automatically  increase  B  in  their  iterative  solution  until  this 
equality  is  achieved.  No  interpretation  of  0  is  offered  there  to 
relate  it  explicitly  to  the  noise. 

In  thermodynamics  the  fluctuation  of  the  energy  in  a  canonical 
ensemble  is  given  by 


6E2  *  k  T2C„  =  k~  ’  (19> 

v  e2 

where  we  will,  in  our  analogy,  put  Boltzmann's  constant  k  =  1.  The 
specific  heat  is  given  by 


C 


V 


_d_E 

3T 


(20) 


These  two  relations  are  general  and  do  not  depend  on  the  assumption 
of  Gaussian  independent  error  or  noise  fluctuation  statistics,  or 
the  validity  of  the  equipartitioning  of  energy  for  the  problem  at 
hand. 
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If  equlpartition  were  to  hold  for  the  ME  problem,  we  would  have  a 
simple  theory  for  the  specific  heat  and  the  fluctuations.  Equipartition 
would  indicate  that  the  fluctuation  or  noise  power  is  distributed  uni¬ 
formly  among  all  the  pixels,  or  mesh  points,  with  1/(28)  in  each  pixel, 
each  pixel  being  a  degree  of  freedom  in  our  present  equipartition 
analogy.  Equipartition  would  hold  if  the  quadratic  energy  term  were 
the  only  factor  in  the  probability  or  free  energy  expression  (Eq.  8): 


P  = 


-8 

e 


(21) 


where  A’  is  the  difference  between  estimate  and  observation.  We  could 
set 


8 


2K) 


(22) 


and  have  a  simple  relationship  between  8  and  the  noise  by  relating 
to  the  noise.  We  would  employ  a  single  6  for  simplicity  rather  than 
separate  multipliers  8^.  But  the  nontrivial  entropy  S  in  Eq .  8  spoils 
the  relation  (Eq.  21)  for  equipartioning  except  for  special  asymptotic 
limiting  cases: 


P 


-6 

=  e 


+S 

e 


(23) 


In  general,  we  must  numerically  calculate  C v>  which  is  B  dependent, 
and  then  use  Eq.  19  to  relate  the  value  of  B  to  the  fluctuation  or  noise 
8E.  The  numerical  calculations  are  required  because  the  series  expan¬ 
sions  do  not  hold  exactly  as  we  pointed  out  in  Eq.  15  and  we  have  not 
yet  developed  a  theory  of  the  specific  heat  for  this  problem.  We 
emphasize  that  the  relation,  Eq.  19,  does  not  depend  on  the  noise  being 
of  the  nature  of  Gaussian- Independent  fluctuations,  but  is  very  general. 
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Some  comments  may  clarify  the  meaning  of  the  energy  fluctuation 
SE  (Eq.  191.  It  is  important  to  note  that  although  the  individual  terms 
within  the  energy  constraint  expression,  Eqs.  4  and  5,  ordinarily  have 
the  meaning  of  physical  energy  (power,  brightness,  intensity,  etc.  or 
their  spectral  counterparts),  we  do  not  refer  to  these  physical 
energies  in  the  phrase  "energy-fluctuation."  Rather,  it  is  our  artifi¬ 
cial  analogy  with  the  ubiquitous  quadratic  energy  form  such  as  occurs 
in  the  simple  harmonic  oscillator  that  permits  us  to  call  the  entire 
constraint  term  an  "energy"  in  our  analogy.  It  is  the  fluctuation  of 
of  the  entire  term  that  is  determined  by  Eq.  19. 


If  we  write  the  measured  quantities  m^  and  the  estimated  quanti¬ 
ties  on  the  same  grid  as  we  may  write  Eq .  19  as  follows.  First  we 
write  fiE  noting  that  averages  are  done  on  but  not  on  the  given  m^: 


We  then  must  square  and  average  to  get 


(25) 


This  complicated  collection  of  higher  moments  can  be  simplified  when 
the  statistical  distribution  of  the  estimated  object  (n)  is  known.  For 
example,  if  it  is  known  to  be  negative-binomially  distributed  (see 
Appendix  B)  the  second  moment  may  be  calculated  as 


(26) 
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Higher  moments  for  this  and  other  distributions  may  thus  be  calculated 
separately  to  simplify  Eq.  24.  Arbitrary  definitions  for  the  fluctua¬ 
tion  such  as 


or  the  root-mean-square  fluctuation 


(27) 


may  be  usefully  employed  as  well  to  simplify  the  results. 

D.  THE  BURG  METHOD 

The  Burg,  or  one-dimensional  Gaussian  statistics  ME  method  and 
related  algorithms  has  been  shown  to  be  equivalent  to  an  all-pole  net¬ 
work  model.  These  ideas  have  inflit  rated  estimation  theory  from  the 
discipline  of  control  theory.  Statistics  other  than  Gaussian  do  not 
necessarily  produce  all-pole  solutions.  There  are  those  who  hold  that 
if  a  process  does  not  have  such  a  network  model  representation,  there 
is  no  physical  reality  involved.  (See  Appendix  B.) 

In  the  one-dimensional  Gaussian-Burg  problem,  as  generally  practiced, 
a  fixed  nuu.oer  of  autocorrelations  or,  alternatively,  lags  in  the 
temporal  data  string  is  chosen;  this  number  is  also  the  number  of  poles 
that  are  determined  in  the  spectrum.  The  location  and  magnitude  of 
these  poles  determine  the  position  and  strengths  of  the  spectral  peaks. 
Where  they  fall,  how  close  they  are  to  each  other,  and  how  strong  they 
are,  are  all  the  result  of  the  calculation.  In  our  formulation  of  the 
Gaussian  problem  we  choose  the  fineness  of  the  mesh  upon  which  the 
estimates  are  constructed  and  upon  which  they  are  constrained  to  stand. 
Only  the  strength  of  the  estimates  at  these  mesh  points  is  calculated. 
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E.  BAYESIAN  INTERPRETATION  OF  THE  ME  METHOD 

The  Bayesian  approach  may  be  viewed  as  a  method  for  computing  the 
conditional  probability  or  probability  distribution  of  a  state  of  nature 
(or  a  cause)  given  the  measured  data  (or  an  effect).  If  we  write  P^(B) 
for  the  conditional  probability  or  probability  distribution  of  B  given 
A,  then  Bayes'  theorem  may  be  written 

P ( B)  P  (A) 

VB)  =  P(A)—  •  (28) 


In  the  ME  problem  we  seek  to  determine  the  most  probable  object 
given  an  image.  That  is,  the  maximum  of  Pj.(Ob), 


Pj (Ob)  = 


P(Ob)  Pob(i) 

pJTj 


(29) 


Now  the  probability  distribution  of  images  given  an  object  P^U)  is 
determined  by  the  instrumental  and  transmission  channel  characteristics, 
including  noise,  by  the  PSF  for  example.  The  probability  distribution 
of  the  images  P ( I )  is  a  constant  with  respect  to  the  maximization  over 
objects.  Wernecke  and  D'Addario*  questioned  the  possible  relationship 
between  the  Bayesian  or  maximum  a  posteori  probability  method  (MAP)  and 
ME,  but  commented  that  P(ob)  was  a  stumbling  block  and  that  to  use  MAP 
one  must  make  a  model  for  the  statistics  of  the  object  P(ob).  From  our 
point  of  view,  the  distribution  P(Ob)  is  given  by  the  physical  statistics 
of  the  problem.  For  example  for  photon  signals,  the  Bose-Einstein 
statistics  apply  (as  discussed  in  Appendix  B) .  With  these  identifica¬ 
tions  and  interpretations,  the  maximization  of  P^fOb)  for  the  most 
probable  object  done  in  the  ME  method  may  be  viewed  as  a  maximization  of 
Bayes'  equation,  a  MAP  method. 
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SECTION  4 


EXAMPLES  OF  ME  ESTIMATION 

We  illustrate  the  ME  method  with  some  examples  of  one-  and  two- 
dimensional  problems  and  modeling  both  Gaussian  and  Poisson  statistics. 
Although  we  conceived  of  these  problems  as  photon  imaging  problems  to 
address  the  issue  of  imaging  with  thin,  sparse  arrays,  it  will  be  obvious 
that  by  changing  the  names  and  meanings  of  the  variables  a  wider  variety 
of  problems  is  implicitly  included. 

A.  ONE-DIMENSIONAL  EXAMPLE 

In  the  first  example,  a  one-dimensional  photon  signal  is  received 
by  an  array  of  antennas  or  apertures.  The  signal  is  supposedly  known 
to  be  the  far-field  radiation  pattern  of  the  unknown  object  and  is  the 
mutual  or  partial  coherence  function;  i.e.,  by  the  Zernike-Van  Cittert 
theorem,  it  is  closely  the  Fourier  transform  of  the  unknown  intensity 
distribution  pattern  of  the  object.  The  array  samples  a  small  discrete 
subset  of  object  transform  values  and  the  ME  method  estimates  the  object. 
The  constraint  energy  term  E  in  this  example  takes  the  squared  difference 
between  the  measured  values  and  the  estimated  K  transform  values,  cal¬ 
culated  from  the  estimated  normalized  object  intensities  {p^}  on  a  fine 
mesh  labeled  i  (1  <  i  <  i  max): 

E  Kl  -  Z  (Z*i  c°s  (2'f<k>  •  »i>  -  )2  no, 

k-1  V(l)  ' 

(k) 

The  spatial  frequencies  f  in  the  incoherent  case  are  a  measure  of  the 
separation  of  pairs  of  elements  in  the  array.  The  number  of  pairs  K 
will  be  less  than  the  number  of  mesh  points  i  max,  so  we  will  be  asking 
for  a  superresolution.  We  use  a  cosine  transform  here  and  create  an 
artificial  extension  of  the  object  to  make  an  even  tunction.  This  was 
done  for  convenience. 


15 
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Notice  that  we  are  constraining  in  transform  space  but  estimating 
J p '  ^  in  object  space  in  this  example.  An  additional  constraint  is 
placed  on  the  overall  object  intensity  so  we  will  estimate  only  the 
relative  shape  of  the  object. 


E  pi  -  1  ■ 


The  free  energy  times  6  becomes 

6F  =  BE  { p± }  +  ^  (piHn  p.  -  pj  +1  +X  (l  -  £  p£  )  .  (32) 

i  i 

This  function  is  to  be  minimized  with  the  Lagrange  multipliers  B  and  X 
for  the  two  constraints.  The  minimum  is  obtained  when 


1  3Pj 


(33) 


is  satisfied  for  all  p.  This  simultaneous  set  of  equations  to  be  solved 

2 

for  p^  is  treated  by  a  special  kind  of  iteration  scheme. 

We  can  guarantee  the  desired  positivity  of  the  {pj}  by  writing  it 
as  p  in  our  formulation.  This  is  a  natural  way  that  does  not 

simply  reject  or  set  to  zero  negative  values  as  sometimes  reported  in 
other  nonlinear  schemes. 

In  this  example  we  estimate  p^  on  a  support  of  imax  =  50  points. 

We  observe  the  first  seven  of  fifty  spatial  frequencies.  In  our  simula¬ 
tion  this  would  represent  a  potential  seven-fold  superresolution  at 
large  values  of  8  if  the  data  allow  it.  To  further  illustrate  the 
possibilities  of  the  method,  the  sixth  of  these  seven  spatial  frequency 
components  was  suppressed,  simulating  a  particular  sparse  array.  The 
diffraction,  band  limited,  or  resolution  limited  object  estimate  (further 
limited  by  the  missing  component)  that  was  obtained  by  Fourier  trans¬ 
formation  of  the  six  components  of  given  measured  image  data,  is  shown 
in  Figure  3. 
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FiKure  3.  The  diffraction-limited  object  estimate  (Fourier  transform  of 
the  observed  or  measured  image  data)  further  degraded  by  one 


missing  Fourier  component. 
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A  series  of  fifty  dimensional  estimates  for  the  objects  {pi}  is 
shown  in  Figure  4  for  seven  different  values  of  B  ranging  from  one-half 
to  120.  It  can  be  seen  that  smaller  values  of  0>  representing  a  lower 
known  confidence  in  the  accuracy  of  the  data  or  equivalent  by  a  lower 
S/N,  produce  a  smoother,  flatter  estimate,  while  large  values  of  6  pro¬ 
duce  sharper,  more  detailed  estimates.  The  diffraction  object  estimate 
falls  somewhere  between  the  curves  for  0  =  1/2  and  6=1,  but  there 
is  no  curve  of  definite  6  that  can  reproduce  the  diffraction  pattern. 

This  behavior  can  be  seen  more  clearly  in  Figure  5.  A  curious  behavior 
can  be  noted  near  the  peak  below  J  =  15  in  Figure  4.  The  0  =  120 
estimate  takes  a  slightly  lower  peak  value  and  the  entire  peak  struc¬ 
ture  moves  slightly  to  the  left.  Why  this  is  a  free  energetically 
favorable  situation  is  not  obvious.  This  example  is  not  a  severe  test 
of  the  ME  method  as  there  is  not  much  structural  detail  at  the  fineness 
of  the  support  of  50  points.  The  next  example  will  show  diffraction- 
limited  behavior  more  explicitly. 

B.  TWO-DIMENSIONAL  EXAMPLES 

In  the  two-dimensional  example  we  now  turn  to,  the  energy  constraint 
is  written  entirely  in  the  spaces  of  the  object  and  the  image,  unlike 
the  previous  example  where  it  was  written  in  the  transform  spaces,  of 
object  and  image.  In  these  examples  we  use  the  Log  B  formulation  of 
ME  appropriate  for  Gaussian  amplitude  or  exponential  intensity  statistics. 
We  may  imagine,  in  these  examples,  that  the  measured  data  are  given 
directly  as  an  image  in  real  space  or  that  it  has  been  transformed  to 
image  space  from  measurements  in  Fourier  space.  The  form  of  the  con¬ 
straint  was  described  in  Eq.  2.  We  chose  a  convolutional  point  spread 
function  of  the  measurement  to  be  of  the  form 
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Figure  4.  A  family  of  ME  object  estimates  for  several  values  of  the 
parameter  6,  compared  with  the  diffraction  object 
est imate . 
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40 


Figure  6.  Two-dimensional  test  object  at  the  resolution 
separation  of  the  aperture. 
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representing  diffraction  spreading  by  a  limiting  square  aperture.  The 
scale  of  diffraction  spreading  was  chosen  to  be  a  factor  of  four  in 
each  dimension.  That  is,  the  number  of  independent  variables  needed  to 
completely  describe  the  image  was  4  x  4  =  16  times  less  than  required  t 
describe  the  object  from  which  it  was  mapped.  The  image  can  be  com¬ 
pletely  and  uniquely  represented  on  a  grid  4x4  times  coarser  than  the 
object . 

In  Figure  6  we  show  a  two-dimensional  test  object  of  two  isolated 
spikes  supported  on  single-grid  locations  separated  by  three  empty  grid 
cells.  Note  that  they  are  on  a  repeat  period  of  four  grid  cells.  The 
heights  of  the  spikes  are  in  the  ratio  of  255  to  128,  and  the  base  of 
the  figure  is  unit  height.  The  4x4  times  degraded  diffraction- 
limited  image  is  shown  in  Figure  7  on  an  appropriately  reduced  coarse 
support  grid.  If  the  two  spikes  of  the  object  in  Figure  6  were  of  the 
same  height,  then  the  image  in  Figure  7  would  be  said  to  have  "resolved 
the  spikes  according  to  Rayleigh's  criterion,  but  as  the  spikes  were  of 
different  heights,  they  are  just  unresolved.  For  convenience,  both  of 
visualization  and  of  computation,  the  image  is  redrawn  on  the  same  fine 
grid  of  the  object  in  Figure  8.  These  interpolated  image  values  are 
redundant  and  contain  no  additional  information.  They  are  uniquely 
determined  by  the  values  on  the  coarse  grid  in  Figure  7. 

The  ME  estimate  of  the  object  was  made  on  a  20  x  20  section  of 
this  image  on  the  fine  grid  so  the  degree  of  superresolution  of  the 
image  was  a  factor  of  4  x  4.  The  result  for  the  case  6  =  10^  is  shown 
in  Figure  9.  The  two  peaks  are  clearly  resolved  and  some  widening  at 
their  bases  can  be  seen. 

The  iteration  process  was  terminated  by  testing  for  when  the  sum 
of  the  absolute  differences  between  the  estimated  values  in  two  suc¬ 
cessive  iterations  was  less  than  or  equal  to  10_^,  a  number  chosen 
arbitrarily.  Some  arbitrary  choice  is  needed  to  terminate  the  process 
of  ev.-r  diminishing  returns.  At  this  test  value  the  peak  heights  were 
254  and  96.  In  the  natural  iteration  method2  used  in  these  examples, 
the  free  energy  necessarily  decreases  monotonical ly  with  iteration  for 


42 


all  6  values.  The  test  described  above  was  found  to  decrease 
exponentially  but  only  for  6  below  a  certain  value,  depending  on  the 
details  of  the  problem.  Above  that  B  value,  the  convergence  was 
impract ically  slow  and  erratic.  The  Newton-Raphson  method  does  not 
guarantee  convergence  from  an  arbitrary  initial  guess  and  also  fails  to 
converge  for  values  of  3  greater  than  a  certain  maximum  value. 

The  second  two-dimensional  example  we  present  is  similar  to  the 
previous  one,  except  that  here  the  image  is  derived  from  the  test 
object,  shown  in  Figure  10,  whose  two  spikes  are  at  half  the  previous 
separation,  i.e.,  at  half  the  resolution  separation  of  the  aperture. 

The  unresolved  image  is  shown  in  Figure  1 1  on  the  appropriate  coarse 
grid.  Figure  12  shows  the  fine  mesh  representation  of  that  image.  The 
ME  estimate  of  the  object  is  shown  in  Figure  13  for  3  =  10"*.  The 
unresolved  peaks  in  the  image  are  now  clearly  resolved  and  the  peak 
values  are  255  and  126.  A  saddle  point  of  height  40  sits  between  the 
peaks,  and  there  is  a  slight  width  at  their  base.  The  displacement  of 
the  peaks  in  this  figure  is  merely  a  result  of  an  improper  plotting  pro¬ 
gram  instruction  and  should  be  ignored. 

The  last  two-dimensional  example  we  present  is  constructed  from 
binary  black  and  white  alphabetical  object  shown  in  Figure  14(a),  con¬ 
structed  on  a  20  x  20  grid.  The  four  by  four-fold  diffraction  image 
is  seen  in  Figure  14(b).  The  400-dimensional  ME  estimate  of  the  object 
is  shown  in  Figure  14(c)  for  the  case  of  6  =  107.  The  convergence  was 
so  slow  and  erratic  at  this  high  value  of  6  that  Figure  14(c)  shows 
rather  unconverged  results;  that  is,  the  fluctuations  between  successive 
iterations  were  still  large.  However,  the  superresolution  was  achieved. 
But  the  gray  level  or  intensity  values  in  the  ME  estimate  are  still 
uncertain  in  Figure  14(c). 
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Figure  10.  Test  object  at  half  the  resolution 
separation  of  the  aperture. 


A  7 


ine  mesh  representation  of  the  image  in  figure  1 
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SECTION  5 


THE  ME  CONFIDENCE  OR  RELIABILITY  ESTIMATE 

When  an  object  is  estimated  using  the  ME  method,  the  question 
naturally  arises  concerning  how  much  confidence  is  to  be  placed  in  the 
estimate.  In  this  section  we  describe  the  theoretical  formulation  for 
the  calculation  of  the  confidence,  and  we  give  some  examples  for  a 
variety  of  measurement  conditions. 

A.  THE  THEORETICAL  FORMULATION 

The  ME  procedure  finds  the  one  pattern  which  maximizes  the  proba¬ 
bility  P  of  different  patterns  appearing  as  the  object.  The  estimated 
pattern  object  {Ob}  designated  by  its  set  of  normalized  variables 
{p^°^}.  The  variables  { p ^ }  are  constrained  to  obey  a  set  of  subsidiary 
conditions . 

Near  the  maximum  of  the  probability  P{p^°^},  P  behaves  quadratically 
and  we  may  assume  it  behaves  as  a  multivariate  Gaussian  distribution. 

We  write  symbolically, 


P{pt}  =  P{Pi(°)} 


(34) 


where  the  {a^}  are  the  standard  deviations  of  the  estimate  to  be 
calculated.  This  is  illustrated  in  Figure  15  as  though  it  were  a 
one-dimensional  problem. 

We  will  ultimately  use  the  calculated  {a,}  in  determining  the 
confidence  of  the  ME  estimated  values  {p^  A  small  a,  referring  to 

Figure  15,  means  that  only  the  values  of  { p }  nearly  equal  to  ip*'0'*’ 
have  a  high  probability  P { p } ;  thus  we  can  rely  on  the  estimate  more 
than  if  the  { a }  were  large.  More  conventionally,  we  -iould  normalize 
the  probability  to  unit  integrated  volume  ("area"  in  Figure  15)  and 
consider  it  to  be  a  probability  density  function.  Then  at  ip^°*  *l*o}, 
the  integrated  area  (twice  the  shaded  area  in  Figure  15)  would  represent 

53 


>*ec£dim;  paqs  blank -not 


10435-  15 


piM 


Figure  15.  Schematic  one-dimensional  illustration  of  the  assumed 
Gaussian  behavior  of  P  near  its  maximum.  If  P  is 
normalized  it  becomes  the  probability  density 
function  and  the  integrated  "area"  becomes  the 
cumulative  probability. 

the  probability  that  Ob{p}  lies  within  the  range  (p^0^±»a).  In  one 
dimension  rhe  probability  would  be  0.68.  We  could  say  in  that  case 
that  the  "confidence  interval"  p^°^  +o  has  a  "confidence  coefficient" 
of  68%  and  that  the  "confidence  limits"  we  set  are  +1*0. 

Our  problem  differs  from  the  usual  statistical  problem  in  several 
respects.  In  our  problem  both  the  mean,  or  more  correctly,  the  most 
probable  fp°},  are  estimated  as  parameters  and  not  measured  as  a 
stochastic  variable,  and  {o}  are  also  calculated  rather  than  measured. 
Furthermore  we  derive  the  actual  probability  distribution  function  P  or 
density  function  P^  from  the  ME  theoretical  formulation,  although  we 
approximate  it  by  a  Gaussian  near  its  maximum.  We  have  typically  only 
one  (multidimensional)  measurement  from  which  we  calculate  both  a 
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(multidimensional)  set  of  {p°}  and  one  of  { o }  along  with  the  probability 
distribution  of  { p }  as  well.  We  could  determine  the  exact  shape  of  the 
multivariate  function  P{p}  and  we  would  not  have  to  approximate  it  near 
the  maximum  by  a  multivariate  Gaussian,  but  the  computational  problem 
is  too  large  to  make  that  option  practical. 

The  schematic  quasi  one-dimensional  drawing  of  Figure  15  can  be 
somewhat  misleading,  so  we  show  in  Figure  16  a  two-dimensional  Gaussian 
distribution.  The  two-dimensional  example  is  easily  interpreted 
geometrical lv  and  is  readily  generalizahle  to  the  multidimensional  case 
of  interest  in  this  problem.  in  two  dimensions. 


where  p  is  the  correlation  parameter  that  characterizes  the  stochastic 
dependence  between  p^  and  p Except  when  p  =  0,  the  axes  of  the 
contour  ellipses  are  not  parallel  to  the  coordinate  system  p^ ,  p^,  and 
the  length  of  the  axes  of  the  ellipses  depend  on  both  {n't  and  ,  . 

Equation  35  can  be  normalized  so  that  its  two-dimensional  integral  is 
unitv  to  make  a  probability-density  function: 

Pd(Pi,P2)  -  - ;  —  exp  (as  in  (35)!  .  (36) 

2-.y,2  yjTT'2 

Curves  of  equal  probability  on  planes  parallel  to  the  PjP,  plane  are 
ellipses  on  the  P  surface  of  Figure  16,  as  can  be  seen  from  the 
exponent  of  Eq .  35.  These  ellipses  may  be  projected  onto  the  Pj,p., 
plane  as  shown  in  Figure  17. 


Figure  17.  Concentration  ellipses  of 
bivariate  normal  density. 
These  are  contours  of  equ 
probability  P  projected 
the  P^?2  plane- 
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2 

given  by  C  ,  is  therefore  equal  to  the  P-fractal  of 

2 

Xp  distribution  function  for  two  degrees  of  freedom 


the  cumulative 

f. 


Xp  (f=2) 


(40) 


Equation  39  is  exactly  the  same  ellipse  as  before  (Eq.  37),  which  was 
expressed  in  terms  of  the  correlated  variables  p.  The  only  purpose 
of  the  transformation  here  is  to  show  the  equivalence  as  a  sum  of 
squares  of  independent  variables.  But  there  are  other  uses  of  the 
transformation  which  we  will  describe  below. 

For  three  variables,  (p^,  P2»  P^) ,  the  trivariate  Gaussian  distri¬ 
bution  p  cannot  be  drawn,  but  we  show  in  Figure  18  one  three-dimensional 
concentration  ellipsoid  represented  in  its  rotated  coordinates  <q^  . 

The  lengths  of  the  three  principal  axes  are  shown  as  well.  The 

probability  that  a  point  is  inside  this  particular  ellipsoid  is  given 

2 

by  the  P-fractal  of  the  cumulative  xp  distribution  function  for  f  =  i 

9  2 

degrees  of  freedom.  We  have  chosen  =  x  =  1  for  Figure  18.  Tables 

2  P 
of  x  show  that  the  confidence  level  P  here  is  20%. 
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Figure  18.  A  concentration  ellipsoid  for  the 
trivariate  Caussian  probability 
distribution. 


39 


In  one  sense,  the  relation  (Eq.  40)  or  its  multidimensional 
equivalent  for  larger  number  of  degrees  of  freedom,  f,  is  a  solution  of 
the  problem  of  finding  the  confidence  region  for  the  ME  estimates.  The 
fj  }  and  {q}  are  calculated  (or  equivalently  the  (pi  and  {o})  for  the 
object  estimate  {p^°^}  and,  for  a  chosen  confidence  level,  say  P  =  95%, 
one  can  compute  whether  any  given  point  {pi,  which  represents  a  multi¬ 
dimensional  estimate  Ob{p},  lies  within  the  hyperellipsoid  given 
by  P.  But  a  graphical  or  pictorial  representation  of  this  condition, 
even  in  two-dimensional  cases,  is  already  problematical. 

Consider  a  two-dimensional  ME  estimate  where  only  two  parameters 
are  being  estimated,  e.g.,  p^  and  p9.  Suppose  they  are  to  be  repre¬ 
sented  graphically  as  in  Figure  19(a)  as  the  strengths  of  the  vector 
components  (or  the  function)  { p ^ } .  (Perhaps  this  would  seem  a  more 
reasonable  procedure  if  we  had  a  larger  number  than  merely  two.)  In 
Figure  19(a)  suppose  we  attempt  to  show  by  two  individual  "error  bars" 
or  confidence  intervals,  a  confidence  band  attempting  to  represent 
one  of  the  accumulation  ellipses  of  Figure  17,  say  for  the  95% 
confidence  level  P.  This  is  replotted  in  Figure  19(b)  around 
(p^°\  P2^°^)*  19(b)  objects  {p^f  are  described  by  the  coordinates 

p^  or  the  vector  fp,}  in  two-dimensional  space.  Imagine  a  point  going 
around  the  circumference  of  that  ellipse,  its  p^  and  p^  values  going 
from  maximum  to  minimum  separately,  but  in  a  concerted  fashion.  If  we 
label  the  maximum  values  of  p^  by  1  and  p2  by  2  in  Figure  19(b),  we 
can  replot  these  points  on  the  confidence  intervals  in  Figure  19(a). 
Notice  that  points  outside  the  ellipse,  such  as  points  4  or  5,  are 
still  represented  by  points  within  these  confidence  intervals  which 
were  determined  by  the  maximum  extent  of  the  ellipse!  By  admitting 
points  sucli  as  4  or  5  we  have  seemingly  enlarged  the  confidence  region 
in  a  distorted  way  to  include  some  portion  of  fractal  regions,  higher 
than  our  chosen  95%  value.  Instead  of  the  desired  correct  ellipse, 
we  have  represented  a  circumscribed  rectangle.  There  is  no  precise 
way  to  represent  the  ellipse  or  hyperellipsoid  in  the  fashion  of 
Figures  19(a)  or  19(c)  as  confidence  bands.  An  inscribed  reC-angle 
for  example,  instead  of  a  circumscribed  one,  would  indeed  decrease,  or 
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perhaps  even  eliminate  improperly  included  extraneous  regions,  but  it 
would  cut  out  some  of  the  ellipse  as  well. 

We  choose  a  graphical  method  which  will  at  least  represent  the 
ellipse  or  hyperellipsoid  exactly.  Yet  it  does  not  always  provide 
visual  understanding  of  whether  a  given  object  estimate  point  in  multi¬ 
dimensional  F  space  with  coordinates  {p^},  lies  inside  or  outside  a 
particular  accumulation  hyperellipsoid  by  merely  displaying  confidence 
bands  laid  out  as  one-dimensional  graphs  of  the  strength  of  the 
components  of  the  vector  fp  *  {confidence  interval}  (except  in 

some  interesting  special  cases) . 

The  method  we  choose,  following  our  present  geometrical  interpre¬ 
tation,  is  to  use  the  endpoints  of  the  ellipse  or  hyperel lipsoid,  i.e., 
the  lengths  along  the  principal  axes,  rather  than  a  circumscribed 
rectangle  as  in  the  example  of  Figure  19(b).  For  any  desired  ellipse, 
these  lengths,  which  are  proportional  to  the  variances  •'  calculated  in 
rotated,  uncorrelated  q  space,  are  then  linearly  projected  onto  p  space 
as  components,  to  be  used  as  confidence  intervals.  In  Figure  20(a)  we 
show  such  a  projection  for  the  two-dimensional  case. 

The  endpoint  q^  along  the  largest  axis  of  largest  o q ^  is  projected 
as  [pj-P2^°^l  along  the  p^  axis  and  as  {P2_P2^°^^  along  the 
p9  axis.  Similarly,  the  endpoint  q2  of  the  smallest  axis  of  is 

pro  jected  as  Ip^-p^°^]^^  and  [ P2-P2 1  ^  •  1°  Pigure  20(b)  we  plot 

the  amplitudes  of  the  projected  components  as  confidence  intervals 
separately  for  the  largest  and  for  the  smallest  axes.  Note  that,  unlike 
the  method  of  Figure  19,  we  now  have  two  separate  confidence  bands, 
one  associated  with  each  of  the  two  principal  axes.  In  Figure  20(c)  we 
show  t he  generalization  to  the  multidimensional  case  of  N  estimates 
or  variables  fp^l,  lii^N.  There  are  a  set  N  different  confidence 
hands.  Often  only  one  or  only  the  first  few  are  of  any  significant 
magnitude,  thereby  greatly  simplifying  the  problem  of  interpretation. 

The  scheme  shown  in  Figure  20(c)  is  the  one  we  adopt  in  this 
report  for  our  49-dimensional  study  example.  The  geometric  interpre¬ 
tation  discussed  above  has  a  well-known  algebraic  counterpart  in  the 
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determination  of  eigenvectors  (or  characteristic  vectors)  and  the 
associated  eigenvalues  (or  characteristic  values).  The  directions  of 
the  principal  axes  of  the  concentration  ellipsoids  are  specified  by 
direction  cosines  which  are  given  by  the  components  of  the  normalized 
eigenvectors  of  a  generalized  variance  or  covariance  matrix  that 
includes  off-diagonal  terms  to  account  for  correlations.  Each  eigen¬ 
vector  belongs  to  an  eigenvalue,  and  the  length  of  any  principal  axis, 
for  a  particular  concentration  ellipsoid,  is  proportional  to  the 
square  root  of  its  associated  eigenvalue. 

It  can  be  shown  that  the  eigenvectors  are  orthogonal;  therefore, 
this  method  of  principal-  ixis  transformation  results  in  uncorrelated 
variates  whose  variances  are  proportional  to  the  axis  lengths  of  any 
specific  concentration  hyperellipsoid.  To  display  the  confidence 
limits  on  the  original  unrotated  object  estimate,  we  project  these 
axis  lengths  back  to  the  original  coordinate  axes  ip'  as  illustrated 
in  Figure  20(c) . 

B.  EXAMPLES 

In  our  ME  confidence  examples  we  choose  for  measurement  data 

constraints,  images  in  Fourier  space  derived  from  a  standard  test 

object.  That  object  consists  of  two  spikes  symmetrically  disposed 

on  a  support  of  l-i <49  at  positions  i=24  and  i=26.  The  "images"  in 

Fourier  space  differ  only  in  the  aperture  or  bandwidth  accepted  for 

their  measurements.  The  scaled  apertures  were  varied  from  an  almost 

lull  aperture  of  48  units,  out  of  49,  giving  almost  enough  resolution 

to  resolve  the  peaks,  through  24,  12  and  6  units  with  approximately 

an  eight-fold  maximum  degration  in  resolution.  This  afforded 

opportunities  for  as  much  as  an  eight-fold  possible  enhancement  by  ME 

superresolution.  The  gamut  of  inverse  noise  or  fluctuation  temperature 
0  1 9 

was  2  2  .  Larger  values  of  p  did  not  converge.  From  all  these 

examples  studied,  we  will  only  present  ten  particular  ones  in  this 
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Figure  20(c).  The  multidimensional  estimates,  including  the  set  of 
confidence  hands,  one  from  each  principal  axis  of  tlu 
hvperel 1 ipsoid,  as  a  generalization  of  20(b). 


report,  chosen  to  illustrate  the  main  results.  A  listing  is  given  in 
1'able  1.  The  ME  calculations  were  done  with  the  log  B  formulation  in 


the  examples.  This  formulation  used  in  these  examples,  it  will  be 
retailed,  is  appropriate  for  signal  statistics  that  are  Gaussian  in 
complex  amplitude  or  exponential  in  intensity. 


Table  1.  Parameters  for  Computations 


Aperture  Size 

Potential  Factor  of 
Super resolution 

Inverse  Temperatures, 

log-,1' 

6 

8 

0,  18 

12 

4 

0,  8,  17 

24 

2 

0,  16,  19 

48 

i 

0,  18 

The  results  are  arranged  in  sets  according  to  the  aperture  sixes 
shown  in  Table  1.  In  each  set  two  or  more  values  of  t  are  chosen  for 
examination.  Heading  each  set  there  is  a  display  of  the  ME  estimates 
for  the  given  aperture,  one  estimate  for  every  value  of  y  in  the 
gamut  of  b.  The  hidden  line  graphic  displays  sometimes,  for  claritv, 
require  a  separate  picture  for  the  positive  and  the  negative  values  of 
the  1  unction  displayed.  It  should  he  noted  that  the  discrete  points 
of  the  functions  are  connected  bv  straight  lines  often  giving  the 
tunctions  the  appearance  of  a  set  of  triangular  spikes.  This  is  merely 
an  artifact  of  the  representation. 

0  18 

The  ME  estimates  for  aperture  size  6  and  2  2  are  shown  in 

figure  21.  The  estimated  objects,  Oblp.i  are  plotted  vertically 

versus  i,  with  t he  estimates  for  various  values  of  ft  placed  one  behind 

tin'  other  in  a  third  dimension.  The  base  of  the  figure  is  zero.  For 

this  small  aperture  size,  superresolution  was  not  achieved,  even  at  the 

I  8 

coldest  temperature  that  we  could  simulate  corresponding  to  i-  =  2 


!hi  ivo  ,  .it  i  •-  24,  i  =  26  are  seen  to  be  rising,  however,  aJong 

w  i !  •  i, i  iiniini  .ring  ot  the  central  peak  lor  i-.  values  greater  than  2^. 
i'll'-.' ,  ml  ill  !  i  other  MK  estimates  we  exhibit  in  this  report,  have 
Mien  i  on  -t  ruined  to  he  normalized  to  unit  energy,  power  or  area,  so 
ti.cte  .lie  on  1  \  .«s  "independent"  variables,  which  are  further  constrained 
in  tin'  >h  me  t  iiod  i,i>.  explained  previously).  This  normalization  explains 
tin-  goner  1 1  i  ise  in  the  background  with  the  fall  in  the  peak  as  the 
est  ini.it  I-  tend-;  to  uniformity  when  .  approaches  zero. 

Die  orthonormal  eigenvectors  of  the  confidence  hyperellipsoid 
tor  :  2*  and  aperture  =  6  are  shown  in  Figure  22(a)  and  22(b).  Since 
these  ((—space  vectors  are  all  still  normalized,  in  the  figure  they 
represent  a  hvpersphere.  1’he  method  used  is  to  plot  the  amplitude  of 
the  p-spaee  components  of  the  eigenvector  (or  eigenfunction)  as  a  contin¬ 
uous  curve,  after  the  procedure  illustrated  in  Figure  20  f,.r  two  dimen¬ 
sions.  Tlie  eigenvectors  are  rank  ordered  according  to  the  strength  of 
their  associated  eigenvalues,  the  greatest  called  the  first  and  the  last 


called  the  49th.  The  hidden  line  plot  hides  all  but  the  first  of  the 


negative  peaks  so  we  separately  plot  the  positive  peaks  up  in  Fig¬ 
ure  22(a)  and  the  negative  peaks  up  in  Figure  22(b).  The  approximate 
zero  level  is  halfway  between  the  base  and  peaks.  These  eigenvectors 
were  then  normalized  according  to  the  square  roots  of  their  respective 
eigenvalues  to  form  ,  ■  and  scaled  by  \l  p"  •  The  result  for  49 

dimensions,  =  2°,  and  aperture  of  6  is  shown  in  Figure  22(c).  The 
base  of  the  figure  represents  minus  one  and  the  top  mesa  represents 
zero.  The  vectors  composed  of  the  sum  of  the  ME  estimates  ;p.^°^  1 
r-  are  shown  in  Figure  22(d)  for  our  present  case  of 


a  pc  r  t  u  re  s i ze  6  and 


....  (o)  , 

1  he  case  ■  p  .  mi  nus 


shown  in  Figure  22(c).  The  zero  ol  ip  is  the  base  o!  Figure  22(d) 
and  22(c).  Figures  22(d)  and  22(e)  are  the  hidden-line  versions  ot  the 
o'p.itMte  upper  and  lower  confidence  limits  shown  in  the  earlier 
illustrative  example  given  in  Figure  20(c).  Here  -  l.ob;  larger 


values  would  make  p  negative. 
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The  real  probability  density  function  for  our  problem  would  not 

have  any  negative  values  of  {p.},  but  our  multivariate  Gaussian  approxi- 

1  2 

mat  ion  has  tails  to  infinity.  For  a  large  enough  v  ,  p.  could  become 

negative.  We  note  in  Table  2  that  the  maximum  value  oi  J?  to  avoid 

this  was  found  to  vary  from  slightly  over  1  to  \l~2  as  r  varies  from 

small  to  large  values.  These  are  the  values  of  ■J7  used  in  the 

2 

examples  plotted  here.  interpreted  as  x  of  48  degrees  of  freedom, 
these  confidence  limits  would  represent  extremely  small  confidence 
coefficients.  Interpreted  as  for  one  degree  of  freedom,  for  the  large 
cases  where  all  but  one  o.  are  essentially  zero,  they  represent 
confidence  coefficients  of  70%  to  84%.  A  possible  explanation  for  this 
upper  band  of  the  range  of  multiplying  factors,  u  17=  VT  for  our 
multivariate  Gaussian  approximate  probability  density  function  mav  lie 
in  the  fact  that  we  are  dealing  with  the  case  of  Gaussian  statistics. 

Our  "log  formulation  used  in  these  examples  is  appropriate  for  this 

statistic.  In  the  Gaussian  case,  Ap^  =  (p)^,  but  recall  the 

de  f  in i t ion 


o 


Tims  for  Gaussian  statistics,  a  =  p.  Therefore  in  this  case, 
p  =  p  -  1*0  is  already  equal  to  zero.  Any  greater  multiplier  of  o 
will  make  p  negative.  This  one-dimensional  example  may  point  the  way 
to  a  proper  understanding  of  our  empirical  observation. 

We  see  in  Figures  22(c),  (d) ,  and  (e)  that  the  hvperellipsoid 
accumulation  is  almost  hvperspherical .  The  re  is  no  one  a  that  is 
remarkably  greater  than  the  others.  The  confidence  region  surrounds 
the  entire  estimate  almost  uniformly.  This  will  prove  to  he  true  for 
all  cases  of  small  T. 

Recalling  that  the  ■  are  equal  to  the  square  root  of  the  eigen¬ 
values,  we  see  in  Table  2  that  the  cases  of  t'  =  2  ,  for  any  aperture 
size,  all  have  an  almost  flat  distribution  of  a.,  varying  hv ,  at  most. 
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table  J .  Data  tor  Computations 


a  factor  of  2.  This  can  be  contrasted  to  other  cases  in  t  he  table 
for  larger  values  where  only  one  >9  is  significant  and  the  others  are 
orders  of  magnitude  smaller. 

1  8 

The  next  example,  still  with  aperture  of  6,  has  r  =  2  .  The 
eigenvectors  art-  shown  in  Figures  23(a)  and  (b)  .  For  this  large  ;• 
there  are  really  only  two  significant  o.  with  the  second  more  than 
four  times  smaller  than  the  first.  This  is  evident  in  Figure  23, 
a  plot  o I  the  projection  ot  \2  ■  i  The  contribution  to  the 

confidence  band  is  onlv  near  two  unresolved  peaks.  The  MK  estimates 
\'  2 ■'  i.-  are  shown  in  Figures  23(d)  and  (e) .  Again,  only  the  first 
two  eigenvectors  are  active  and  onlv  on  the  two  shoulders  of  the  two 
unresolved  peaks. 

increasing  the  aperture  size  to  12  we  see  in  Figure  24  that  lor 

8  1 8 
>  2  ,  the  MK  estimates  begin  to  resolve  the  two  peaks;  at  :•  =  2 

thev  are  far  more  resolved  than  demanded  by  the  Rayleigh  criterion. 

This  is  a  four-fold  Increase  in  the  resolution  of  this  aperture. 

For  this  aperture  of  size  12  and  r  =  2°  the  series  of  plots. 
Figures  25(a)  through  (f)  describe  the  nature  of  the  confidence  bands. 
I'lie  peaks  are  not  resolved  (Figure  25(f)),  and  the  base  represents 
p.  =  o).  The  confidence  bands,  again  as  in  the  previous  :  =  2°  case, 

are  derived  from  a  hypersphere  as  all  the  ■  ,  are  closely  equal  (see 
Table  2)  the  smallest  being  3/4  of  the  largest. 

For  -  2  and  aperture  12,  the  ratio  of  the  I irst  two  are 

l 

,/  ■  |  0.2H.  Thu.-;  we  must  consider  both.  Figure  2b(c)  shows  the 

relative  strengths  o|  their  associated  eigenvectors.  The  base  ot 
Figure  2b(c)  represents  minus  one.  The  elleel  on  the  contidenee  in  the 
est  im.it  e  ,  ul  p  .  i  an  he  seen  in  the  I  irst  two  rows  ot  Figures  Jh(d> 
and  (e)  as  a  broadening  ot  the  unresolved  single  spike  at  the  posit  inn-, 
i  24  and  i  2b,  wheie  the  two  object  spikes  might  be  expected  to 


and  a  |>e  i  t  ul  e  12,  the  ratio  n  1  the  I  irst  two  is 

i 

so  on  I v  the  l  irst  has  an  important  int  limine.  That  only 
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one  eigenvector  has  anv  influence  is  shown  in  Figure  -7(c)  where  the 
base  represents  minus  one.  The  estimates  p.^°^  •  \  2  •  seen  in 

Figures  27(d)  and  (e)  show  the  effect  ot  the  first  eigenvector  in 
putting  a  confidence  hand  at  twice  the  estimated  peak  height-.. 

ME  estimates  for  the  case  ol  aperture  size  24,  for  •  2  *  * 

are  shown  in  Figure  28  from  the  rear,  to  better  show  that  the  ME 
estimates  are  beginning  to  resolve  the  two  peaks  when  .'-2^.  It  is 
clear  in  this  reverse  plot  that  for  small  r  the  estimates  • p .  •  Lend 

to  a  flat  distribution  of  uniform  height  except  for  a  small  single 
unresolved  peak.  The  coni idence  estimate  is  developed  in  Figures  29(a) 
through  (e)  for  the  ease  .  =2°.  The  spread  in  eigenvalues  is  slowlv 

increasing  as  the  aperture  enlarges.  Here,  for  aperture  24,  the  ratio 
of  c  is  -  1.47,  still  showing  that  all  the  .  are  essential  lv 

equal.  This  can  he  noted  in  Figure  29(c),  the  base  of  which  is  at 
minus  one  and  the  mesa  top  at  zero.  The  upper  and  lower  confidence 
hands  are  shown  in  Figures  29(d)  and  (e),  respectively,  with  the  bases 
set  to  zero  level.  The  estimate  itself  is  shown  separately 

in  Figure  29(f)  with  the  base  at  zero  level. 

For  the  case  F  =  2^,  aperture  24,  just  before  obvious  superresolu¬ 
tion,  the  confidence  results  are  developed  in  Figures  30(a)  through  (e) . 
The  estimate  |p.^°^i  is  shown  separately  in  Figure  10(f)  with  zero  at 
the  base.  Here  the  ratio  of  the  first  two  •  is  a^/o  =  9.4,  and  the 
lirst  two  eigenvectors  explain  most  of  the  variance  as  .-eon  in  Fig¬ 
ure  30(c),  where  the  base  is  equal  to  minus  one.  The  upper  and  lower 

bounds  of  the  cc.nf  idence  hands  are  seen  in  Figures  30(d)  and  (e)  . 

19 

For  -  2  aperture  24,  see  Figure  31.  As  might  be  expected, 

)  2 

almost  all  the  variance  is  in  one  •  =  .  From  Figure  31(c)  we 

note  that  the  confidence  band  will  he  large  only  at  the  two  resolved 
peak,  as  seen  in  Figures  31(d)  and  (e). 

The  final  example  for  an  almost  full  aperture  of  48  is  interest  ini' 
in  that  it  asks  no  superreso 1 ut i on  ot  the  MK  method  and  reveals  the 
behavior  in  an  unclouded  wav.  The  ME  estimates  p.1'1  themselves  are 
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Ortho-normal  eigenvectors  projected  on  p-space  for 
aperture  size  24  and  6=2°.  Positive  peaks. 
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Figure  29(b).  Ortho-normal  eigenvectors  projected  for  p-space  for 
aperture  size  24  and  8=2°.  Negative  peaks. 
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given  in  Figures  32(a)  and  (b)  for  the  standard  and  reverse  view, 
respectively.  in  Figure  32(b)  we  see  that  the  peaks  are  already 
resolved  at  p  =  2°.  Even  at  this  large  aperture  the  total  gamut  of  the 
is  only  a  factor  of  two,  and  all  the  eigenvectors  contribute  to 
explain  the  variance  as  can  be  seen  in  Figure  33(c).  The  bounds  of  the 
confidence  bands  are  shown  in  Figures  33(d)  and  (e) ,  and  the  estimate 
itself  {p.(°^}  is  given  in  Figure  33(f). 

The  very  last  example,  p  =  2  ,  aperture  of  48,  is  shown  in  the 

series  of  Figure  34.  In  this  extreme  case  the  first  o  is  more  than  two 
thousand  times  larger  than  the  second  (see  Table  2)  and  clearly 
accounts  for  almost  all  the  variance.  The  hyperellipsoid,  in  this  as 
in  all  the  other  high  P  cases,  is  a  "hyperneedle."  In  Figure  34(c)  the 
projected  components  of  this  vector  can  be  seen  to  lie  exactly  on  the 
peaks  of  the  object  estimate,  which  have  values  almost  exactly  reaching 
one  half  at  the  peaks  (0.49866)  and  zero  at  the  background  (0.00006). 

The  upper  and  lower  confidence  bands  are  shown  in  Figures  34(d)  and  (e) . 

It  should  not  be  assumed,  by  way  of  generalization,  that  the 
important  eigenvectors  will,  in  all  cases,  mimic  the  shape  of  the 
estimate  at  high  8  values  as  seen  in  this  example.  In  another  example, 
not  illustrated  here,  with  a  one  spike  object,  the  important  eigen¬ 
vectors  did  cluster  around  the  estimated  spike,  but  had  both  symmetric 
and  antisymmetric  forms  whose  large  components  did  not  necessarily 
exactly  coincide  with  the  position  of  the  object  peak. 
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Figure  34(a). 


Orthonormal  eigenvectors  projected  on 
p-space  for  aperture  size  48  and  B  2  ‘ 
Positive  peaks. 
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SECTION  6 


SUMMARY  OF  RESULTS 

The  work  on  the  simulation  of  imaging  with  thinned,  sparse,  random 
arrays,  done  in  the  first  two  years  of  this  contract,  was  summarized 
in  the  Annual  Report,  June  1979.  In  this  section,  we  summarize  the  final 
year's  work  on  the  ME  method.  We  present  a  general  view  of  maximum  en¬ 
tropy  (ME)  as  a  method  to  choose  the  most  probable  object  estimate  from 
the  set  of  all  the  possible  objects  consistent  with  the  degraded  measure¬ 
ments.  The  probabilities  are  assigned  in  accordance  with  the  physical 
statistics  of  the  problem  at  hand.  The  entropy  is  the  logarithm  of  the 
probability.  We  show  that  the  two  contending  forms  of  entropy  found  in 
the  current  literature,  the  "logB"  and  "BlogB"  forms,  are  special  cases 
of  a  more  general  entropy  based  on  quantum  statistics.  Our  particular 
formulation  of  Lhe  ME  method  can  be  expressed  as  an  analogy  to  the  thermo¬ 
dynamical  method  of  the  minimization  of  the  free  energy.  This  analogy 
allows  a  natural  way  to  introduce  the  matter  of  fluctuations  or  noise 
into  the  method.  The  problem  of  a  complete  theory  of  the  fluctuations 
still  remains,  awaiting  a  theory  of  the  analogy  to  the  specific  heat. 

Several  examples  of  ME  estimation,  in  one  and  two  dimensions  and 
for  varving  values  of  noise  parameter,  were  calculated  and  the  results 
are  presented  graphically.  The  limiting  effects  of  noise  on  the 
possibilities  of  superresolut ion  are  described. 

A  general  method  for  assessing  the  degree  of  confidence  in  the 
multidimensional  ME  estimates  was  developed.  The  ME  formulation  we 
developed  provides  a  multidimensional  probability  function  lor  all 
possible  potential  object  estimates  consistent  with  tin-  measured-image 
signal  data.  The  most  probable  one  is  picked  as  the  estimate.  We 
expand  this  distribution  function  near  its  maximum  for  simplicity,  and 
approximate  it  as  a  multivariate  Gaussian  distribution.  Bv  principal 
axis  transformations,  we  derive  the  variances  and  then  project  them 
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back  to  the  original  space  of  the  estimates  as  confidence  bands.  A 
clu-square  test  in  many  dimensions  is  used  as  the  basis  of  the  confi¬ 
dence  estimate.  A  particular  example  was  studied  in  some  detail.  Two 
de 1 1 a- f line t i on  spikes  on  a  support  ot  dimension  49,  separated  bv  one 
space,  was  chosen  as  the  unknown  test  object.  Various  images  were 
derived  hv  choosing  progressively  smaller  apertures  to  test  the  method 
as  a  function  of  super resolut ion  and  fluctuation  or  noise  temperature. 

Manv  variations  were  presented  spanning  a  range  of  inverse  fluctuation 

0  19 

or  noise  temperature  from  2  to  2  and  potential  superresolution 
factors  of  from  one  through  eight. 
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PAPERS  RESULTING  FROM  AFOSR  SUPPORT 

Comments  on  "Spectral  Estimation:  An  Impossibility?"  B.H.  Soffer 
and  Ryoichi  Kikuchi,  Proc.  IEEE  67 ,  1672  (1979). 

In  preparation  are:  "Maximum  Entropy  Image  Restoration  II  and 
III,  to  be  submitted  to  JOSA. 
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Maximum  entropy  image  restoration.  I.  The  entropy  expression 

Ryoichi  Kikuchi  and  B.  H.  Softer 
Hughes  Research  Laboratories.  Malibu.  California  90265 
(Received  3  July  1976;  revision  received  11  August  1977) 

The  two  entropy  expressions,  log  B  end  -  S  logfl  (where  B  is  the  local  brightness  of  the  object  or  its 
spatial  spectral  power)  used  in  maximum  entropy  (ME)  image  restoration,  are  derived  as  limiting  cases  of  a 
general  entropy  formula.  The  brightness  B  is  represented  by  the  n  photons  emitted  from  a  small  unit  area  of 
the  object  and  imaged  in  the  receiver.  These  n  photons  can  be  distributed  over  z  degrees  of  freedom  in  q{n.t\ 
different  ways  calculated  by  the  Bose-Einstein  statistics.  The  entropy  to  be  maximized  is  interpreted,  as  in 
the  original  definition  of  entropy  by  Boltzmann  and  Planck,  as  logy(n ,  z)  This  entropy  expression  reduces 
to  logB  and  -  fl  log  B  in  the  limits  of  n>z  >  1  and  n<z,  respectively.  When  n  is  interpreted  as  an 
average  h  over  an  ensemble,  the  above  two  criteria  remain  the  same  (with  n  replaced  by  h),  and  in  addition 
for  the  z  =  1  case  the  logB  expression,  used  in  ME  spectral  power  estimation,  is  derived  for  n  >  z  =  1. 


I.  INTRODUCTION 

The  ideal  of  maximum  entropy  (ME)  restoration  has 
appeared  in  various  forms  in  the  technical  literature 
spanning  a  gamut  of  disciplines  from  seismic  spectral 
power  estimation  and  prediction  to  astronomical  image 
restoration.  Hie  subject  was  recently  reviewed,1  and 
an  extensive  bibliography  was  given.2 

The  concept,  as  applied  to  the  image-restoration 
problem,  can  be  viewed  as  an  attempt  to  find  the  radiant 
spatial  power  pattern  of  the  object  by  maximizing  the 
entropy  of  the  pattern  subject  to  the  image -data  con¬ 
straints.  However,  the  method  has  a  weakness  in  its 
very  foundation.  There  are  two  schools  of  thought  which 
differ  as  to  the  form  of  the  entropy  to  be  maximized. 

For  convenience,  we  refer,  in  the  remainder  of  this  pa¬ 
per,  to  these  two  as  the  "logB”  and  the  B  logB" 
schools.  (Although  the  B  is  defined  more  fully  below, 

B  may  be  interpreted,  for  purposes  of  this  introduction, 
as  the  brightness  of  the  object  or  as  its  spatial  spectral 
power.)  In  his  pioneering  work  in  spectral  power  esti¬ 
mation  for  geophysical  applications ,  Burg3  used  the  logB 
expression  for  the  entropy,  following  directly  from  Shan¬ 
non’s4  work.  In  the  image-restoration  field,  Ponsonby* 
and  Wernecke  and  D’Addario3  also  used  the  logB  expres¬ 
sion.  In  a  recent  successful  work  on  astronomical  image 
restoration,  Frieden*  used  the  -BlogB  expression  for 
his  entropy 

Since  these  two  schools  have  been  working  indepen¬ 
dently,  the  question  arises  as  to  which  entropy  expres¬ 
sion  is  the  correct  one  to  use.  This  question  was  noted 
by  Frieden1  and  Wernecke  and  D’Addario2  as  a  challeng¬ 
ing  problem.  The  present  paper  is  an  effort  to  solve 
this  problem  by  demonstrating  the  conditions  under  which 
the  logB  and  the  -  BlogB  expressions  should  be  used. 

We  prove  that  these  expressions  are  limiting  cases  of 
a  more  general  expression.  We  are  mainly  concerned 
with  the  problem  of  image  restoration  (as  worked  on, 
for  example,  by  Ponsonby,5  Wernecke  and  D’Addario, 2 
and  Frieden1-*).  However,  our  conclusions  can  be  easily 
translated  into  the  spectral  power  estimation  problem 
(worked  on,  for  example,  by  Burg,3  Radoski,  Fougere 
and  Zawalick,1  and  Ulrych*). 

The  present  paper  is  based  on  two  basic  interpreta¬ 
tions  of  the  ME  restoration  of  an  image.  The  first  is 
that,  when  an  image  pattern  is  given,  there  are  many 
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different  possible  object  patterns  that  can  produce  the 
same  given  image  pattern.  This  many-to-one  mapping 
from  object  to  image  is  an  intrinsic  property  of  any 
real  measuring  system  that  has  a  finite  aperture  or 
bandwidth  and  hence  a  point-spread  function  of  nonzero 
width.  Another  way  of  interpreting  this  point  is  the  fol¬ 
lowing.  Hie  information  contained  in  the  object  pattern 
is  carried  by  photons  through  an  optical  system  and  is 
recorded  in  the  image.  Information  is  lost  in  the  opti¬ 
cal  system,  and  our  problem  is  to  analyze  the  nature  of 
the  optical  communication  channel,  and  then  work  out 
the  restoration. 

The  first  interpretation  leads  directly  to  the  second. 
Since  there  are  many  possible  object  patterns,  we  are 
faced  with  the  problem  of  choosing  one  or  the  problem 
of  making  a  criterion  for  choosing  one.  The  ME  method, 
as  interpreted  in  the  present  paper,  provides  such  a 
criterion.  It  presupposes  that  a  probability  of  occur¬ 
rence  can  be  assigned  to  each  possible  pattern,  and  then 
the  ME  method  chooses  the  most  probable  pattern  as  the 
estimated,  or  restored,  object  pattern. 

This  second  point  of  our  interpretation  of  the  ME  re¬ 
storation  method  may  encounter  resistance  from  some 
readers.  Thus  we  hastily  add  some  explanation.  The 
word  entropy  is  often  interpreted  either  as  the  concept 
form  which  thermodynamic  equilibrium  is  presupposed 
or  as  a  measure  of  "randomness”  (often  without  a  strict 
definition  of  randomness).  We  do  not  invoke  these  con¬ 
cepts;  in  the  present  paper,  the  word  entropy  is  used  in 
the  sense  of  Boltzmann’s  definition  and  is  defined  as  the 
logarithm  of  the  probability.  Since  the  logarithm  is  a 
monotonic  function  of  the  argument,  to  find  a  maximum 
of  the  entropy  is  to  find  a  maximum  of  the  probability . 

Based  on  these  two  fundamental  concepts,  we  show  in 
the  present  paper  what  the  probability  of  finding  an  ob¬ 
ject  pattern  is,  and  then  we  show  under  what  conditions 
the  entropy  form  to  be  used  is  logB  or  -  B  logB.  Instead 
of  starting  from  Shannon's  entropy  expression,  as  Burg3 
did,  we  go  much  further  back.  (Shannon’s  entropy  is 
shown  to  be  a  special  case  in  a  later  section.)  We  start 
(in  Sec.  II)  with  the  probability  of  a  brightness  pattern 
in  an  object  (based  on  the  Bose-Finstein  statistics, 
which  photons  obey).  This  leads  to  the  logB  and  -  BlogB 
distinction  for  certain  special  cases.  The  number  of 
degrees  of  freedom  for  the  photons,  a  basic  concept  in 
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Sec.  II,  is  discussed  separately  and  in  detail  in  Sec.  ni. 

The  entropy  used  in  Sec.  n  is  perfectly  authentic 
and  in  keeping  with  Boltzmann’s  original  concept;  how¬ 
ever,  the  formulas  in  Sec.  II  differ  from  the  ordinarily 
seen  -  f  log/’  form  of  the  entropy.  To  resolve  the  ques¬ 
tions  some  readers  may  have  concerning  these  formulas 
a  simple  example  of  the  multinomial  expression  applied 
to  dice  throwing  is  given  in  Sec.  IV.  The  concepts  of 
a  priori  probability  and  the  familiar  entropy  form 
-/  logf  are  explained  in  that  section. 

The  dice  example  in  Sec.  IV  leads  to  the  next  section, 
in  which  Shannon’s  and  Burg’s  formalism  are  derived 
and  interpreted.  Section  V  presents  the  formulation 
based  on  the  ensemble,  which  can  be  useful  in  accounting 
for  noise  in  the  ME  formulation. 


11.  PROBABILITY  OF  AN  OBJECT  PATTERN 

We  assume  that  the  object  of  interest  is  a  two-dimen¬ 
sional  distribution  of  photon  sources  in  the  far  field. 

The  two-dimensional  space  in  which  the  object  is  viewed 
is  divided  by  a  hypothetical  rectangular  mesh  into  equal 
square  cells  of  area  tc*  each.  The  choice  of  the  size  n> 
is  arbitrary  at  this  stage,  but  not  without  important  con¬ 
sequences,  as  will  be  discussed  below. 


The  number  of  photons  coming  from  the  7th  object  cell 
( j  =  1 ,  2 , .  .  . )  to  the  receiver  in  the  observation  time  t 
is  written  as  the  dimensionless  quantity  n,,  (These  pho¬ 
tons  may  be  emitted  from  the  object  or  reflected  from 
it.)  The  ultimate  goal  of  the  maximum  entropy  method 
we  present  in  this  paper  is  to  -calculate  the  most  probable 
spatial  pattern  {n y}  for  the  object.  To  free  the  presenta¬ 
tion  from  unessential  complications ,  we  assume  that  the 
photons  are  quasimonochromatic  with  bandwidth 
Further,  it  is  assumed  that  the  object  is  not  colorful, 
which  implies  that  all  A  is,  =  Ap.  This  simplifies  the  dis¬ 
cussions  of  coherence  volume’  and  of  the  number  of  de¬ 
grees  of  freedom  for  the  photons . 

A  basic  postulate  of  our  analysis  is  that  for  each  cell 
there  correspond  z  degrees  of  freedom  for  the  photons 
detected  from  that  cell.  The  value  z  is  proportional  to 
the  area  wz,  the  bandwidth  Ay,  the  aperture  of  the  de¬ 
tection  apparatus,  and  the  time  interval  t  of  observation. 
However,  we  first  leeve  the  value  of  z  unspecified,  ex¬ 
cept  that  it  is  a  given  positive  integer  and  is  constant. 

The  meaning  and  the  value  of  z  are  discussed  in  detail 
in  the  next  section. 

The  number  n,  of  photons  are  arranged  (distributed) 
over  the  z  degrees  of  freedom  within  the  bandwidth  Ay 
with  the  condition  that  multiple  occupancy  in  one  degree 
of  freedom  is  allowed  because  of  the  Bose  nature  of 
photons.  The  number  of  macroscopically  indistinguish¬ 
able  ways  q,  that  such  an  arrangement  can  be  formed  is 
expressed  by  the  combinatorial  formula  for  Bose-Ein- 
stein  statistics10: 


q  ,(»,)  = 


{n,  +  z  -  1)1 
n,\  (z  -  l)t 


(2.1) 


Each  of  these  arrangements  is  a  quantum-mechanical 
state  of  the  n,  photons.  We  now  postulate  that  each  of 
the  q,  different  (microscopically  distinguishable  but 


macroscopically  indistinguishable)  arrangements  occurs 
with  the  same  o  priori  probability.  This  postulate, 
which  corresponds  to  the  "equal-weight”  principle  of 
quantum  mechanics  that  each  eigenstate  of  the  Hamilto¬ 
nian  is  occupied  with  the  same  probability,  is  the  basis 
of  the  statistical  analysis  of  this  paper.  The  equal- 
weight  postulate  allows  interpreting  q ,  as  the  weight  fac¬ 
tor,  or  the  degeneracy,  for  cell  intensity  n,.  Thus  q , 
is  proportional  to  the  probability  that  it,  photons  are  dis¬ 
tributed  over  the  z  degrees  of  freedom. 

The  concept  of  the  probability  q,  in  Eq.  (2. 1)  can  be 
understood  by  a  simple  example  of  throwing  dice.  Sup¬ 
pose  we  throw  two  dice  and  count  the  sum  of  the  two 
numbers.  The  number  of  different  ways  the  sum  5,  for 
example,  appears  is  four,  because  the  possible  combi¬ 
nations  are  (1 , 4),  (2,3),  (3,2),  and  (4,1).  We  write  this 
as  q( 5)  =  4.  The  general  expression  for  the  number  of 
ways  q(n)  that  the  sum  n  appears  is 

q(n)  =  6  —  !  7  —  w I  .  (2.2) 

For  fair  dice,  the  probability  that  the  sum  n  appears  is 
proportional  to  q(rt).  Corresponding  to  what  we  said 
concerning  Eq.  (2. 1),  the  postulate  for  this  example  is 
that  each  of  the  four  states  (1,4),  (2,3),  (3,2),  and  (4,1) 
appears  with  the  same  a  priori  probability. 

Different  from  q(n)  for  dice  in  Eq.  (2.2),  the  algebraic 
expression  for  q/nj)  for  photons  in  Eq,  (2. 1)  contains 
factorials.  For  the  purely  practical  reason  that  in 
working  mathematically  with  factorials,  it  is  easier  to 
first  take  the  logarithm,  we  take  the  logarithm  of  q,  in 
Eq.  (2. 1)  and  introduce  an  expression  sy: 

sjn^zlnqjfriy)  ,  (2.3) 

We  do  not  yet  call  this  quantity  entropy  because  we  wish 
to  avoid  invoking  the  connotations  and  misconceptions 
sometimes  associated  with  the  word.  We  now  examine 
some  limiting  cases  of  Eqs.  (2. 1)  and  (2.  3). 

[A]  When  z  =  l,  Eqs.  (2. 1)  and  (2. 3)  give 

<7,  =  1  and  sy  =  0  .  (2.4) 

This  is  understandable  because,  when  the  number  of  de¬ 
grees  of  freedom  is  unity,  all  n,  photons  occupy  the  same 
degree  of  freedom  and,  as  photons  are  indistinguishable 
from  one  another,  the  degeneracy  is  unity. 

[b]  When  1  <  z  « n, ,  it  is  easier  to  work  with  the 
logarithmic  form  s,(nj )  because  we  can  use  Stirling’s 
approximation.  Neglecting  z/n,  in  the  expansion,  we 
arrive  at 

s,  =  (z  -  ljlroo-lnjz  -  1)!  .  (2.5) 

[C]  When  z  >>nl,  we  again  use  Stirling’s  approxima¬ 
tion,  neglect  ny/z,  and  approximate  Eq.  (2. 3)  as 

s,  =  rt,lnz -wy(lnn> -  1)  .  (2.6) 

This  expression  is  exactly  the  classical  (i.e, ,  Maxwell  - 
Boltzmann  particle  statistics)  limit  of  Eq.  (2. 1)  when 
nt  "particles”  are  distinguishable,  in  agreement  with  the 
general  property  that  in  the  limit  of  z  Bose  statis¬ 
tics  approach  classical  statistics. 

We  are  interested  in  the  entire  object  made-  of  many 
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cells,  each  of  area  w2.  We  ask  the  number  of  ways  n,, 
Ht, . . .  photons  come  from  the  first,  second, . . .  cells, 
respectively,  independently  of  each  other.  Then  the  en¬ 
tire  number  of  ways  Q  is  the  product  of  q,’s  in  Eq.  (2. 1) 
since  each  cell  is  independent : 

> 

We  then  see  what  the  logarithm  of  Q  looks  like.  In  writ¬ 
ing  it,  we  introduce  n  for  the  total  number  of  photons: 


j 


and  define,  for  mathematical  convenience,  a  normalized 
object  pattern 

pt*n,/n  .  (2.9) 

This  introduction  of  the  normalized  p ,  does  not  mean 
that  we  are  assuming  p,  to  be  independent  of  the  total 
intensity  of  light  n;  when  the  object  is  illuminated  by 
another  light  source,  for  example,  pt  can  vary  as  the 
total  illumination  changes  because  reflectivity  may  not 
be  linear  in  intensity  and  uniform  over  the  object.  At 
any  rate ,  it  is  meaningful  to  talk  about  the  relative  local 
intensity  p  for  a  given  illumination  condition,  and  for  a 
given  recorded  image. 

The  three  limiting  cases  of  S^lnQ  are  given  below: 

[A]  When  z  =  1 , 

<?  =  1  and  S  =  lnQ  =  0  .  (2.10) 

This  means  that  the  probability  Q  is  independent  of  the 
photon  distribution  \p,} ,  whatever  {p,}  may  be,  and  hence 
all  distributions  {p,}  in  the  object  space  are  equally  like¬ 
ly! 

[B I  When  1<  *«»«,,  Eq.  (2. 5)  leads  to 

l)lnn  -ln(z  -  1)!  ]  +  (z  -  1)  £ln#>r  . 

I  /  ! 

(2.11) 

[C]  When  z  »nJ(  Eq.  (2.6)  leads  to 

S  =  S  si  =  nin{z/n)-n^2p,{lnp,  -  1)  •  (2.12) 

y  / 

Following  Boltzmann11  and  Planck,12  it  is  customary 
to  call  the  logarithm  of  the  degeneracy  (weight  factor , 
or  probability)  the  entropy.  Since  the  probability  is 
proportional  to  the  number  of  ways  a  certain  event  oc¬ 
curs  (when  each  way  appears  with  the  same  a  priori 
probability),  our  S,  which  is  the  logarithm  of  the  num¬ 
ber  of  ways,  may  also  be  called  the  entropy. 

In  the  '‘maximum  entropy”  algorithm  for  calculating 
the  object  distribution,  z  and  the  total  intensity  n  are 
fixed  numbers;  therefore  only  the  p -dependent  terms  in 
Eqs.  (2. 11)  and  (2. 12)  are  significant.  When  we  exam¬ 
ine  the  p-dependent  terms,  we  note  that  Eqs.  (2,11)  an 
and  (2.12)  contain  Zlnp,  and  -Zp^lnp,,  respectively. 

The  pj  in  these  expressions  is  the  normalized  brightness 
which  we  wrote  as  B  in  the  introduction.  Therefore  it 
is  legitimate  to  identify  case  [B]  with  the  log#  expres¬ 
sion  used  in  ME  Image -restoration  studies2’1  and  case 
(C]  with  the  -BlogB  expression.*  (We  deliberately 


use  B,  for  brightness,  rather  thanp  because  of  the  opin¬ 
ion,  sometimes  heard,  that  the  entropy  is  always  of  the 
form  -plnp  and  never  lnp.)  In  each  of  these  expres¬ 
sions,  an  accurate  value  of  z  is  not  needed  in  finding 
the  distribution  {/>,}  that  maximizes  the  entropy.  What 
is  needed  is  the  knowledge  of  the  ratio  n/z.  The  num¬ 
ber  of  photons  n/z  per  degree  of  freedom  (or  per  mode) 
is  a  useful  concept  and  is  widely  used  in  the  statistics 
of  radiation. 11 

One  exception  to  which  the  n/z  criterion  so  far  cannot 
apply  is  case  [A ],  in  which  z  =  1.  The  analysis  in  this 
section  says  that  case  [A],  to  which  the  wave  analy¬ 
sis2’ 1,2  initiated  by  Burg  actually  belongs,  cannot  be 
classified  either  to  the  log5  case  or  the  -  BlogB  case. 
We  return  to  this  problem  in  Sec.  V,  in  which  we  treat 
the  problem  based  on  a  fixed  value  of  the  average  of  n 
rather  than  n  itself. 

ID.  NUMBER  OF  DEGREES  OF  FREEDOM 

The  idea  of  the  number  of  degrees  of  freedom  z  tor 
photons  used  in  the  previous  section  is  basic  in  this  pa¬ 
per.  Therefore  we  discuss  the  concept  in  detail  in  this 
section.  The  z  for  photons  emitted  from  an  area  tc2  of 
the  object  space  and  observed  within  time  t  can  be  de¬ 
veloped  in  either  real  space  or  phase  space. 

In  real  space,  first  we  consider  the  coherence  volume 
for  a  photon.  In  the  longitudinal  direction  of  propaga¬ 
tion,  this  volume  may  be  considered  to  have  a  coherence 
length  l  given  by  the  coherence  time  r  ~  l/Ae  multiplied 
by  the  velocity  of  propagation  c : 

f=CT  .  (3.1) 

In  the  transverse  direction,  the  coherent  area  u  grows 
by  propagation  to  a  large  range  R  and  is  inversely  pro¬ 
portional  to  the  area  of  the  source  wl.  Using  an  arbi¬ 
trary  cutoff  for  the  acceptable  degree  of  partial  coher¬ 
ence,  we  may  write  a  simple  expression  for  o  from  the 
Zernicke-van  Cittert14  theorem  in  the  far-field  limit: 

,  (3.2) 

where  X  is  the  wavelength  of  the  photons.  Again,  note 
the  dependence  of  a  on  the  source,  except  for  R *. 

The  coherence  volume  may  be  taken  as  the  product  of 
the  coherence  area  o  and  coherence  length  l : 

uM#  =  cTO  =  csR2/i^ikeM'2  .  (3.3) 

The  coherence  volume  corresponds  to  one  degree  >( 
freedom  for  the  photon;  it  is  not  possible  to  distinguish 
the  photons  by  interference  experiments,  for  example, 
in  one  degree  of  freedom  or  in  one  coherence  volume. 

Now  we  define  z.  Suppose  we  detect  photons  in  a  de¬ 
tection  time  /.  Ignoring  the  transverse  coherence  area 
for  a  moment,  we  can  say  that  we  have  detected  these 
photons  in  f/r  =  z  coherence  lengths  or  degrees  of  free¬ 
dom,  The  number  of  degrees  of  freedom  should  also 
depend  upon  the  ratio  of  the  area  A  of  the  detector  aper¬ 
ture  to  the  coherence  area  o  of  a  photon.  This  ratio 
contributes  a  factor  A/o  in  z,  so  z  may  be  viewed  as  the 
number  of  times  the  coherence  volume  cto  (which  rep¬ 
resents  a  degree  of  freedom)  is  contained  in  the  "detec¬ 
tion  volume”  cfA : 
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1  <s  6  K 


(3.4) 


We  may  think  of  the  number  of  degrees  of  freedom  as 
being  made  of  two  factors :  a  temporal  one  z ,  and  a  spa¬ 
tial  (two-dimensional  area)  z,.  Whenever  either  z,  or  zt 
is  less  than  unity,  the  photons  detected  are  still  indistin¬ 
guishable,  so  we  round  z,  or  z,  up  to  one  degree  of  free¬ 
dom.'  We  have  used  a  simplified  definition  of  the  num¬ 
ber  of  degrees  of  freedom  to  avoid  nonessential  compli¬ 
cations;  we  assume  that  simple  reciprocal  relations 
hold  between  wz  and  a,  and  between  Ap  and  r,  and  that 
all  Ai»(  =  Ap.  A  detailed  study  of  coherence  volume  with 
more  general  partially  coherent  sources  given  in  Ref. 

9  agrees  with  the  qualitative  presentation  given  here. 

The  number  of  degrees  of  freedom  can  also  be  repre¬ 
sented  in  phase  space.  The  phase  space  of  the  conjugate 
variables,  time,  and  frequency  (bandwidth),  is  commonly 
employed  in  communication  and  signal  processing  theory 
Utilizing  the  sampling  theorem,  this  phase  space  is  usu¬ 
ally  partitioned  into  equal  unit  area  slices  at  equal  time 
intervals  representing  one  degree  of  freedom  (2,  =  1)  per 
measurement  in  the  sampling  of  a  time  series.  Alter¬ 
nately,  it  is  partitioned  into  equal  unit  area  slices  at 
equal  frequency  differences  representing  one  degree 
of  freedom  (z,  =  l)  per  measurement  in  the  sampling  of 
a  spectrum. 

Since  it  is  not  necessary  to  have  these  particular  par¬ 
titionings  or  to  have  only  one  degree  of  freedom  per 
measurement  (except  for  the  satisfaction  of  the  sampling 
theorem),  measurements  are  not  always  made  in  that 
fashion.  As  we  shall  see,  for  example,  in  the  observa¬ 
tion  of  Poisson  temporal  processes,  z,  is  necessarily 
greater  than  one. 

The  two  variables,  spatial  extent,  and  spatial  frequen¬ 
cy  which  are  conjugate  to  each  other,  define  additional 
dimensions  of  phase  space  besides  the  familiar  time- 
bandwidth  dimensions.  This  geometrical  phase  space 
of  space  and  reciprocal  space  has  been  used  in  studies15 
to  define  the  number  of  degrees  of  freedom  in  images . 
This  is  our  z,  factor.  We  include  in  the  image  analysis 
and  estimation  problem  the  previously  neglected  (in 
image  studies)  z,  to  count  the  entire  number  of  degrees 
of  freedom  in  an  object.  Our  previous  definitions  lead¬ 
ing  to  the  number  of  degrees  of  freedom  (Eq.  (3.4)1  are 
simply  a  casting  of  phase  space  ideas  in  terms  of  coher¬ 
ent  volumes  and  detection  volumes . 

Several  comments  are  in  order.  Changing  the  arbi¬ 
trary  size  w  of  the  object  space  cells  will  have  impor¬ 
tant  effects  on  the  reconstruction  and  possibly  even  on 
the  statistics.  This  is  not  a  defect  of  the  method,  but 
rather  one  of  its  important  assets.  For  example,  mak¬ 
ing  w  smaller  than  the  resolution  limit  set  by  the  detec¬ 
tion  aperture  A  will  result  in  "super-resolution.”  When 
we  make  the  object  cell  size  to*  smaller  than  a  resolu¬ 
tion  element  to  achieve  super-resolution,  z,  remains  at 
its  minimum  value  of  one  degree  of  freedom,'  but  the 
number  of  photons  decreases  in  proportion  to  to*.  This 
is  an  important  example  of  a  case  where  the  n/z  ratio  is 
not  a  constant  determined  by  physical  properties  of  the 
source.  The  number  of  photons  per  degree  of  freedom 


decreases  as  the  super-resolution  increases  as  A/a 
<  l(and  in  the  same  fashion  as  f/r<  1).  Super-resolution 
may  be  viewed  as  a  special  case  of  reconstructing  from 
under-sampled  data.  More  general  cases  of  undersam¬ 
pling  of  the  aperture  plane  (e.g. ,  sparse  arrays  of  an¬ 
tennas),  which  do  not  involve  super-resolution  at  all  in 
the  reconstruction,  can  also  be  handled. 

Throughout  this  discussion  we  have  assumed  rather 
ideal  detection  processes  in  which  the  photon  distribu¬ 
tion  is  closely  mirrored  by  the  distribution  of  photoelec¬ 
trons,  or  of  exposed  silver  halide  grains  in  photography. 
The  accumulation  of  photons  has  been  assumed  to  be  pro¬ 
portional  to  time.  Similarly,  we  have  assumed  that  the 
number  of  collected  photons  is  proportional  to  the  de¬ 
tector  area  A.  When  the  efficiency  oi  photoelectron  de¬ 
tection  is  less  than  unity,  binomial  distributions  for  the 
probability  of  detection  and  nondetection  alter  the  distri¬ 
butions.16  However,  for  the  estimation  of  n  and  n  ;z,  it 
is  sufficiently  accurate  to  take  thephotoeiectron  distribu¬ 
tion  as  a  mirror  of  the  photon  distribution.17  in  fact,  the 
distribution  devised  by  Mandel16  for  photoelectrons  from 
rather  different  considerations  is  exactly  the  negative 
binomial  distribution  which  we  also  display  for  bosons. 
Photographic  detection  is  complicated  by  a  threshold  ef¬ 
fect,  but  the  effects  of  photon  statistics  have  been  theoreti¬ 
cally  shown1’ to  display  the  characteristic  boson  "clumping” 
effects  with  increasing  n/z  in  the  density  versus  expo¬ 
sure  photographic  characteristic  curve,  consistent  with 
the  Bose  distribution.  The  distortions  of  detection  will 
rarely  be  so  great  as  to  alter  the  estimate  of  the  magni¬ 
tude  of  n/z,  which  is  all  that  is  required  here. 

An  absorption  at  the  source  or  at  the  detector  as  well 
as  change  in  the  illumination  intensity  (in  the  case  of 
reflection)  not  only  change  the  total  intensity,  but  may 
also  influence  the  statistics.  If  we  have  a  priori  knowl¬ 
edge  of  this,  it  could  be  taken  into  account  to  determine 
a  corrected  n/z  appropriate  to  the  statistics  before  the 
lossy  process. 

Let  us  examine  some  examples  which  have  different 
n/z  values.  If  we  had  some  a  priori  knowledge  of  the 
photon  frequency  distribution  function  of  the  object,  for 
example,  if  we  know  that  the  radiation  detected  at  the 
frequency  v  was  from  a  black  body  at  a  certain  tempera¬ 
ture  T,  then  we  could  use  the  well-known  relation  for 
the  average  number  of  photons  per  mode  (or  per  degree 
of  freedom)  at  thermal  equilibrium  n/z  as  a  function  of 
T  and  v: 

n/z=[exp(hv/kT)-l]'1  ,  (3.6) 

where  h  is  Planck’s  constant.  A  reduced  value  of  n/z 
is  employed  when  A/o<  1,  as  in  super-resolution,  or 
when  f/r<  1  as  mentioned  before.  As  an  illustration, 
suppose  we  were  to  restore  an  image  of  the  sun,  which 
is  approximately  a  blackbody,  the  mean  chromosphere 
temperature  of  which  could  be  taken  closely  at  6000’ K. 
Using  Eq.  (3.6),  we  find  that  n/z  •■<  1  in  the  visible  re¬ 
gion  and  that  -BlogB  is  the  appropriate  limiting  form 
of  the  entropy  expression.  For  a  solar  image  using 
wavelengths  larger  than  3  pm,  n/z  is  greater  than  one, 
and  the  correct  form  would  be  logs.  Since  all  the  planets 
have  a  very  small  n/z  ratio  in  the  visible  and  infrared 
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regions ,  -  B  log B  is  the  appropriate  entropy  form  to  use 
(or  restoring  their  images. 

Without  this  kind  of  blackbody  a  priori  information, 
we  could  still  make  independent  measurements  of  z  or 
of  the  n/z  ratio.  A  determination  of  source  size,  di/ 
from  the  spectral  line  shape,  and  detector  parameters 
will  give  z.  The  number  of  photons  n  is  presumed  mea¬ 
surable.  More  conveniently,  if  the  isotropic  flux  or  the 
related  brightness  B  (W  m‘*  sterad"1  Hz'^is  measured, 
then 

n/z  =  B( v)c*/2hv*  .  (3.7) 

The  factor  of  2  is  chosen  for  the  case  of  unpolarized 
brightness.  This  relation,  of  which  the  blackbody  dis¬ 
tribution  is  a  special  case,  can  be  seen  to  follow  from 
the  definition  of  B  and  from  Eqs.  (3.2),  (3.3),  and  (3.4), 
providing  1/t  is  simply  dv.  To  this  approximation  we 
may  apply  Eq.  (3.7)  to  estimate  n/z  for  nonthermal 
sources  such  as  the  many  radio  astronomical  objects 
where  synchrotron  radiation,  for  example,  plays  a 
dominant  role.  Equation  (3.  7)  tells  us  that  the  bright 
Double  Cygnus  A  (3C  405)  at  960  MHz  has  an  enormous 
n/z~  10*.  Similarly,  Centaurus  A  (CTA  59)  at  178 
MHz  has  a  very  large  n/z  -  104.  The  bright  radio  astro¬ 
nomical  objects  likewise  have  a  large  n/z,  and  the  logB 
entropy  expression  is  appropriate . 20  Images  of  less 
bright  objects  and  at  shorter  wavelengths  will  dictate 
the  -BlogB  form. 

For  the  optical  astronomical  objects  Frieden  worked 
with,  we  see  n/z  «  1  and  hence  -  BlogB  is  the  entropy 
expression  that  Eq.  (2. 12)  gives,  which  is  in  agreement 
with  Frieden’s  choice.  The  radio  astronomical  case  of 
Wernecke  satisfies  the  relation  n  »z  >  1,  and  hence  the 
logB  expression  of  Eq.  (2. 11)  is  the  right  entropy  ex¬ 
pression,  which  again  supports  Wernecke ’s  choice. 

IV.  EXAMPLE  OF  MANY  THROWS  OF  DICE 

The  main  purpose  of  the  present  paper  is  to  point  out 
the  importance  of  the  q,(n, )  factor  in  Eq.  (2. 1)  in  formu¬ 
lating  the  entropy  of  the  ME  method.  However,  we  must 
still  answer  the  legitimate  question  of  why  our  Initial 
and  fundamental  entropy  (2.1)  does  not  resemble  the 
expression  ordinarily  seen  for  entropy,  namely  -E/»ln/. 
or  -£/>,lnp,  where,  for  example,  /„  is  the  distribution 
function  and  pt  might  be  the  probability.  To  answer  this 
question,  we  start  with  an  example  of  counting  the  num¬ 
bers  on  many  throws  of  two  dice. 

In  the  existing  logB  school  of  ME  spectral  estimation, 
for  example,  the  basic  assumption  is  that  the  measured 
number  of  photons  n  can  be  taken  as  the  estimate  of  the 
average  number  of  photons  n,  in  unit  spectral  width  (or 
emitted  from  the  ith  cell  of  the  object).  This  is  differ¬ 
ent  from  the  basic  postulate  of  Sec.  U  that  the  number 
(not  the  average  number)  of  photons  of  the  ith  cell  is  nt. 
When  we  say  the  average  is  n(,  we  are  thinking  of  many 
cases  with  varying  individual  numbers.  For  example, 
when  two  dice  are  thrown  many  times,  the  average  of 
the  throws  is  7,  but  individual  throws  can  be  distributed 
between  2  and  12.  To  explain  the  relation  between  the 
point  of  view  of  Sec.  II,  which  is  based  on  one  fixed 
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number  and  the  other  point  of  view  based  on  the  av¬ 
erage  h(,  we  use  yet  another  dice  example  in  this  sec¬ 
tion. 

Suppose  we  throw  two  dice  many  times  and  ask  how 
often  the  number  «  appears.  Let  the  total  number  of 
throws  be  .Vf,  and  the  number  of  throws  in  which  n  ap¬ 
pear  be  Mf„.  Then  the  probability  that,  out  of  the  en¬ 
tire  M  throws,  2  appears  S1ft  times,  3  appears  Aff, 
times, . . . ,  is  written 

9(fl<  f' . /u)=  (Affj)!  (.Vf/3)!  ...  <M/U)! 

*<?(2)'"*9(3)'"3...<?(12)<'/i2 

u 

E/.  =  l  .  (4.1) 

*•2 

in  which  q(n)  is  the  probability  that  n  is  found  in  one 
throw.  The  previous  equation  (2.  2)  is  an  example  of 
q(n)  for  a  pair  of  fair  dice. 

When  we  throw  the  dice  a  very  large  number  of  times 
(Af  » 1 ) ,  the  number  of  times  n  appears  approaches 
Mf  i“’,  where  the  set  {/);“'}  maximizes  the  probability 
function  / j°\  ...,/[£’).  The  common  mathemati¬ 

cal  procedure  of  finding  a  maximum  of  such  a  function 
is  to  require  derivatives  of  its  logarithm  to  vanish.  So 
we  first  form  the  logarithm  using  Stirling’s  approxima¬ 
tion  (which  is  permitted  since  M  is  a  large  number)  as 
r  “  12 

Saln<P=M 

*’(■-§'•)]•  «-2> 

The  last  terms,  X(  ),  are  written  to  satisfy  the  normal¬ 
ization  of /„,  and  X  is  the  Lagrange  multiplier.  When 
we  maximize  S  in  Eq.  (4.2),  we  arrive  at  the  most  prob¬ 
able  distribution: 


Several  comments  are  in  order. 

(i)  The  <P(/2,/j, . . .  ,/i*)  expression  in  (4. 1)  is  a  term 
in  the  multinomial  expression.  The  expansion  can  be 
written 

[q(2)  +  q(3)  +  . .  .  +^(12)1  “  =  ^iV(/2,  f, _ fa)  , 

(4.4a) 

where  the  sum  £  is  done  under  the  condition 

M ft  +  Mfi  ♦  •  •  •  +  Mf„  =  Af  .  (4.4b) 

The  multinomial  expression  <y(/2,  fs, ....  fn)  is  used  ex¬ 
tensively  in  probability  theory. 21 

(11)  The  probability  expression  e(  f2,  f, . fu)  in 

Eq.  (4. 1)  is  based  on  Af  throws  of  the  dice.  A  collection 
of  repetitions  of  an  event  is  often  called  an  ensemble. 
Thus  we  can  say  f,  represents  the  distribution  of  u  over 
an  ensemble. 

As  we  see  in  Eq.  (4.3),  the  probability  of  a  single 
throw  q{n)  is  proportional  to  the  distribution  function 
over  an  ensemble.  In  the  same  way,  any  probability 
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consideration  (for  example,  the  probability  concept  used 
in  Sec.  II)  has  the  concept  of  an  ensemble  in  its  founda¬ 
tion.  Actually  the  probability  can  only  be  defined  using 
many  tries  [this  is  discussed  in  (iii)  below].  However, 
when  we  use  the  term  ensemble  in  the  present  paper,  it 
is  defined  in  a  limited  sense,  namely  that  the  average 
number  of  photons  n  is  fixed  in  the  ensemble,  but  that 
individual  n’s  in  the  individual  facsimiles  can  fluctuate. 

(iii)  For  fair  dice,  we  can  write  on  q(n)  of  Eq.  (4.3) 
as  given  in  Eq.  (2.2).  However,  if  the  dice  are  loaded, 
the  probability  of  finding  n  is  not  given  by  Eq.  (2.  2),  but 
can  be  a  complicated  function  of  n.  Even  in  such  a  case, 
many  throws  (the  limit  of  Af  -«)  can  yield  the  knowledge 
of  q(n)  through  Eq.  (4.  3).  In  Sec.  II,  we  used  the  quan¬ 
tum  nature  of  photons  and  accepted  the  formula  for 
<7y(n,)  as  given  in  Eq.  (2. 1). 

(iv)  The  reason  why  we  give  the  dice  example  in  this 
section  is  to  point  out  that  the  probability  expression 
(P(/,, .  . . ,  fu)  in  (4. 1)  is  nude  of  two  factors.  We  will 
call  the  combinatorial  factor  a1!  and  the  rest  <P2.  It  is 

a  common  practice,  which  we  do  not  follow  in  this  paper, 
to  call  lnd/  the  entropy;  ln<Pj  takes  the  familiar  form 

ln<Pl  =  -.tf£/,ln/„  •  (4.5) 

Jt 

The  most  probable  distribution  /£'”  is  derived  not  by 
maximizing  only  ln<p,  in  Eq.  (4. 1)  but  by  maximizing  the 
entire  probability  ln(P,  +  ln<P2  in  Eq.  (4.2).  Only  in  the 
case  in  which  (P2  is  a  constant,  independent  of  /,,  can  the 
most  probable  f\‘'  be  derived  by  maximizing  ln<Pi  alone. 
This  latter  case  will  be  discussed  as  a  special  case  of 
the  development  given  in  the  next  section. 

V.  FORMULATION  BASED  ON  THE  AVERAGE  n y 

In  Sec.  II  we  mentioned  under  the  classification  [A] 
that  when  z  =  1  we  cannot  reconstruct  the  image  using 
the  ME  concept  (and  Burg’s  case  belongs  to  this  class). 
This  is  because  the  probability  of  n,  photons  being  emit¬ 
ted  from  the  jth  cell  is  calculated  using  the  < it(nj)  ex¬ 
pression  of  (2. 1),  which  Is  identically  unity  for  one  de¬ 
gree  of  freedom. 

Even  when  2  =  1,  however,  we  can  talk  about  the  prob¬ 
ability  that  n  photons  come  from  the  cell  If  we  have  pre¬ 
vious  knowledge,  by  some  means,  of  the  distribution 
function  /,.  Actually  we  do  not  need  to  know  the  entire 
function  /,,  but  it  is  sufficient  to  identify  n,  as  an  esti¬ 
mate  of  the  average  number  n,  of  the  distribution.  When 
we  make  this  identification,  we  can  derive  the  most 
probable  distribution  f  J*1  and  the  probability  of  finding 
rt  associated  with  Thus,  we  formulate  the  present 
section  based  on  the  distribution  and  on  the  concept  of 
the  ensemble  (in  the  sense  we  defined  It  in  the  previous 
section).  This  treatment  allows  us  to  understand  some 
cases  of  the  logB  school  of  the  ME  estimation  method. 

We  consider  an  ensemble  made  of  a  large  number  M 
of  nearly  identical  facsimiles  or  sytems,  each  system 
representing  the  single  jth  cell  treated  in  Sec.  II.  The 
number  of  systems  which  have  n  photons  each  is  written 
fr\1.  This  function  /.  Is  not  the  so-called  cumulative 
distribution,  and  it  is  sometimes  called  the  probability 


density  function.  This  definition  implies  the  normaliza¬ 
tion  for  f„: 

£/.«l  •  (5.D 

The  set  {/,}  specifies  a  particular  state  of  the  ensemble. 

We  are  interested  in  counting  the  number  of  different 
ways,  which  we  designate  as  P  {/,},  that  an  ensemble 
can  be  constructed.  P  {/„}  is  made  of  two  factors.  The 
first  is  the  number  of  ways  of  making  a  distribution  {/„} 
over  the  ensemble: 

(*/.).  •  (5-2) 

•  n 

This  corresponds  to  the  combinatorial  factor  in  Eq. 

(4. 1).  The  second  factor  comes  from  the  fact  that  n 
photons  from  a  cell  have  the  a  priori  weight  or  degen¬ 
eracy  factor  q(n)  as  is  represented  in  Eq.  (2. 1): 

q(n)  =  («  +  z-1)!/h!(z-1)!  .  (5.3) 

Since  each  system  can  take  its  configuration  indepen¬ 
dently  from  the  other  systems  in  the  ensemble,  the  sec¬ 
ond  factor  <?{/„}  is  a  product  of  (5.3)  for  each  system 
and  is  expressed 

Q{/„}  =  II  <7(«)*""  .  (5.4) 

R 

The  total  number  of  ways  that  the  distribution  {/,}  is 
achieved  in  the  ensemble  is  then  the  product  of  the  two 
factors  0{/,}  in  (5.2)  and  Q{f„)  in  (5.4): 

=  «{/,} <?{/,}  •  (5.5a) 

This  P  is  of  exactly  the  same  nature  as  <?  in  Sec.  IV  and 
is  a  multinomial  expression: 

PW-nworn,'">"'-  ■  i5-5b’ 

n 

After  we  thus  define  the  distribution  {/,}  over  the  en¬ 
semble,  we  ask  for  the  most  probable  distribution  {/’*’} 
when  the  average  n  is  given  as  n.  This  distribution  {/i,01} 
must  be  the  one  which  maximizes  the  probability  (degen¬ 
eracy,  or  the  number  of  ways  of  constructing  the  ensem¬ 
ble)  P{/,}  in  (5.5).  Since  it  is  convenient  for  mathema¬ 
tical  reasons,  we  maximize  lnP{/„}: 

S{/.} » lnP{/J  =  lnn{/,}  +  ln<j{/,}  .  (5.6) 

One  may  call  either  Infi{/,}  the  entropy  or  the  entire 
lnP{/„}  the  entropy.  However,  there  is  no  such  arbitrari¬ 
ness  when  the  maximum  distribution  {/£“'}  is  to  be  de¬ 
rived.  The  function  to  be  maximized  is  not  lnfl{  /}, 
which  has  the  familiar  -  f  Inf  form  of  entropy,  but  the 
entire  laP{/,},  which  includes  the  a  priori  terms.  Since 
the  term  “maximum  entropy  restoration"  is  in  use,  we 
will  call  the  entire  expression  (5.6),  In P{0,  the  en¬ 
tropy  in  this  paper . 

Using  Stirling’s  approximation,  we  derive 

S{  f,\/M  =  -  £  f,( Inf,  -  l )  -  1  ♦  £  f.  Inqfn ) 

“  R 

-?'■)•  ,5-” 
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In  anticipation  of  the  maximization,  we  used  Lagrange 
multipliers  x  and  4  for  the  subsidiary  conditions  ex¬ 
pressed  by  Eq.  (5. 1)  and  by 

«  =  f»  .  (5.8) 

n 

The  most  probable  distribution  /J0>  is  derived  by  max¬ 
imizing  (5.  7)  as 


(n  +  z  -  1)! 
n!  (z  -  1)! 


(5.9) 


The  normalization  coefficient  e‘‘  is  determined  from  Eq. 
(5. 1)  as 


e 


\ 


-  V'  <”  +  z  ~  ^ ! 
"  £o  »!  (*  -  D! 


e 


-UH  _ 


1 

(l  -  e~“Y  ‘ 


(5.10) 


form  that  Burg  used  in  his  ME  treatment.  (The  mathe¬ 
matics  is  the  same  if  we  formulate  for  the  spectral 
analysis  problem. )  Our  derivation  of  (5. 16)  in  this  sec¬ 
tion  is  similar  to  the  method  used  in  Shannon’s  theory4 
of  communication.  In  Shannon’s  treatment,  the  frequen¬ 
cy  of  repetition  of  measurement  is  controlled  by  the 
sampling  theorem;  each  measured  sampled  point  there¬ 
fore  corresponds  to  one  degree  of  freedom,  mode  or 
state,  and  hence  z-  1  Naturally,  the  Gaussian  distri¬ 
bution  in  complex  amplitudes  for  the  classical  wave  rep¬ 
resentation  or,  equivalently,  far  the  exponential  dis¬ 
tribution  of  photons  plays  the  dominant  role  in  the  tradi¬ 
tional  information  theoretical  exposition  as  the  most 
probable  distribution,  and  for  these  cases  our  reasoning 
supports  Shannon’s  expression  of  information. 


Because  the  combinatorial  factor  in  (5.9)  is  thus  of  the 
expansion  coefficients  in  (5. 10),  the  distribution  in  (5.9) 
is  called  the  negative  binomial  distribution,22  and  (5.10) 
is  its  generating  function.  The  other  multiplier  >  is  de¬ 
termined  from  (5.8)  as 

4  =  In”  1*  .  (5.11a) 

n 

Substituting  this  in  (5. 10)  allows  us  to  write  X  as 

x  =zln^-i-i  .  (5.11b) 


Using  these  two  expressions  in  (5.9),  we  can  write  /*•* 
explicitly  as 


(n  +  z  -  1) ! 
n!  (z-1)! 


(5.12) 


When  the  entropy  expression  (5.7)  is  a  maximum  and /„ 
is  equal  to/J01  in  (5.  9),  we  can  show,  using  Stirling’s 
approximation,  that  the  entropy  corresponding  to  the 
most  probable  distribution  is 

=  (in  +  X  *  (ft  +  z)ln(n  +  z)  -nlnn  -  z  lnz  .  (5.13) 


An  entropy  expression  equivalent  to  (5.13)  was  derived 
by  Gamo25  for  partially  coherent  light  beams. 


We  examine  the  three  limiting  cases  corresponding  to 
those  presented  in  Sec.  n. 

[A]  When  z  =  1,  the  a  priori  probability  q(n)  in  (5.3) 
reduces  to  unity  so  that  the  general  entropy  expression 
(5.7)  simplifies  to  the  first  summation  -£/„lnf„;  this 
is  the  expression  of  the  entropy  used  by  Shannon  in  his 
information  theory.  In  this  case,  the  most  probable 
distribution  (5. 12)  is  reduced  to  the  exponential  distri¬ 
bution24  : 


and  the  entropy  expression  (5. 13)  becomes 

=  (n  +  l)ln(n  +  1)  -  n  Inn  .  (5.15) 

When  n  »1,  this  further  reduces  to 

S[C(/.M.l*lni  .  (5.18) 

This  equation  is  the  classical  wave  (field)  limit  of  the 
entropy  expression  for  the  Bose -Einstein  statistics. 

The  entropy  expression  in  (5. 18)  is  equivalent  to  the 


In  considering  the  distribution  function  f„  in  this  treat¬ 
ment,  Shannon  considers  n  to  be  a  continuum  variable. 
We  can  easily  check  that,  when  summations  over  dis¬ 
crete  n  used  in  deriving  (5. 15)  are  changed  into  inte¬ 
grations  over  the  continuum  of  n,  we  arrive  at  (5. 161 
rather  than  (5. 15).  So  far  as  our  formulation  is  con¬ 
cerned,  this  is  equivalent  to  assuming  n  1, 

In  deriving  the  negative  exponential  distribution  (5. 14), 
we  assumed  that  n  is  fixed.  This  procedure  is  equiva¬ 
lent  to  using  the  complex  wave  amplitude  formulation 
together  with  the  assumption  that  the  power  (i.e.  ,  the 
second  moment)  is  given  and  deriving  the  Gaussian  dis¬ 
tribution  in  complex  amplitude  as  was  done  by  many 
authors  in  communication,  astronomical  imaging,  and 
geophysical  applications.25 


It  is  noteworthy  that  the  distribution  function  (5. 12) 
for  a  general  z  (when  more  than  one  degree  of  freedom 
is  involved  in  the  considerations)  can  be  derived  from 
the  z  =  1  case  (5. 14).  As  an  example,  we  derive  the 
case  for  z  =  2.  The  distribution  function  for  this  case  is 
derived  using  the  property  that  the  distributions  in  each 
degree  of  freedom  are  independent  and  thus  the  joint 
probability  is  a  sum  of  products  of  (5. 14)  in  the  convolu¬ 
tion  form : 


By  comparing  this  with  (5.12),  we  can  verify  that  (5.  17) 
is  the  distribution  f{„°‘  in  (5. 12)  for  the  case  z  =  2  and 
N  =  2n.  Using  (5. 10)  and  the  generation  function  tech¬ 
nique,  it  can  be  found  in  general  that,  when  z  factors  of 
the  exponential  distribution  (5.14)  are  convoluted,  the 
negative  binomial  distribution  (5. 12)  results  with  the 
mean  of  the  resultant  distribution  equal  to  z  times  the 
original  average  n  in  (5. 14).  This  convolution  is  an¬ 
other  way  of  understanding  the  negative  binomial  distri¬ 
bution  (5. 12)  that  arises  when  more  than  one  degree  of 
freedom  is  involved.  This  result  also  confirms  the 
legitimacy  of  the  entropy  expression  (5.6)  in  which  both 
the  O  term  and  the  Q  term  are  included. 


[ B ]  When  z  n,  the  entropy  expression  (5  131,  using 
Stirling’s  approximation,  reduces  to 

5{/i'”}/.V  =  z  Inn  *  z  -  z  lnz  .  (5.  IB) 

This  corresponds  to  the  logfl  (Inn)  entropy  expression 
(2.  5),  recalling  that  z  is  a  constant,  and  n  is  the  vari- 


1662  J  Opt  Soc  Am  ,  Vol  67,  No  12  December  1777 


K  Kikut-hi  and  B  H  Softer  |r><x2 


151 


able.  The  other  terms  are  merely  additive  constants. 


[C|  When  z  is  large  compared  with  n,  we  can  use  the 
same  approximation  as  we  used  in  (2. 6)  and  reduce  /*•’ 
in  (5.9)  to 

f(o>  =  . 

(5.19) 

Further,  (5.11)  reduces  to 

4  =ln(z/n)  and  X  =  n  . 

(5.20) 

From  these  two,  we  obtain 

/r=^(»)7«!  , 

(5.21) 

which  is  the  Poisson  distribution.  The  entropy  expres¬ 
sion  (5. 13)  reduces  to 

=  n\nz  -  n  (Inn  -  1)  .  (5.22) 

This  corresponds  to  the  -BlogS  expression  in  (2.6), 
again  recalling  that  z  is  a  constant  and  n  is  the  variable. 
Equation  (5.  22)  is  the  classical  (i.e. ,  Maxwell-Boltz- 
mann)  particle  limit  of  the  Bose  statistics. 

The  Poisson  distribution  of  photons  in  this  limit  is 
well-established  experimentally.  The  derivation  of  the 
Poisson  distribution  in  this  limit  l C )  and  the  exponential 
distribution  in  limit  [A ]  supports  the  correctness  of  the 
analysis  in  this  section.  Many  authors  have  noted  the 
importance  of  the  Poisson  distribution  and  have  em¬ 
ployed  it  for  imaging9  and  communication29  studies. 
However,  they  did  not  derive  it  as  a  mosi  probable  dis¬ 
tribution  under  specific  conditions  as  they  did  for  the 
Gaussian,  but  adopted  the  Poisson  distribution  ad  hoc. 
The  reason  they25  could  not  so  derive  the  Poisson  dis¬ 
tribution  is  that  they  maximized  only  the  lnfl{/„}  term, 
leaving  out  the  In <?{/„}  term  in  (5.6).  In  evaluating  the 
probability  when  f„  is  of  the  Poisson  form ,  they  again 
incorrectly  used  -£  f„log/„  for  the  (logarithm  of)  prob¬ 
ability  (which  does  not  represent  the  total  probability  in 
the  Poisson  case).  However,  Renyi  and  McFadden27 
did  each  independently  derive  the  Poisson  distribution 
by  maximizing  the  probability  expression,  which  included 
the  appropriate  a  priori  term,  as  we  do  in  the  present 
paper.  Although  they  were  interested  only  in  the  prop¬ 
erties  of  the  probability  of  point  processes,  their  ap¬ 
proach  and  result  support  our  reasoning. 

One  objection  to  including  the  ln<j{/„}  term  in  the  en¬ 
tropy  expression  might  be  that  such  a  definition  would 
not  satisfy  the  intuitively  motivated  induction  condition, 
the  third  property  for  the  measure  of  information  postu¬ 
lated  by  Shannon:  namely,  that  if  an  original  choice  is 
composed  of  several  successive  choices,  then  the  mea¬ 
sure  of  information  should  be  expected  to  be  the  weighted 
sum  of  the  indivickial  measures.29  On  examination,  this 
condition  or  axiom  is  meaningful  only  when  the  choices 
are,  a  prion,  equally  likely;  it  is  not  suitable  when  they 
are  not.  When  the  choices  are  equally  likely,  the  inQ{/„} 
term  in  (5.8)  is  irrelevant  and  thus  the  -  /log/ form  of 
Shannon’s  information  results. 

VI  RELATION  BETWEEN  THE  TWO  DERIVATIONS 

In  Secs,  n  and  V,  we  derived  the  entropy  expressions 
for  a  cell  and  for  an  ensemble  of  cells,  respectively. 

By  comparing  the  two  results,  we  can  show  the  consis- 
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tency  of  our  analysis.  We  will  demonstrate  that  if  the 
exposure  time  of  forming  the  image  is  very  long,  for 
example,  the  analysis  of  Sec.  II  (which  is  based  on  a 
fixed  number  of  photons)  and  the  analysis  of  Sec.  V 
(which  is  based  on  a  fluctuating  number  of  photons)  agree 
with  each  other  in  the  entropy  expression  as  long  as 
n/z  is  the  same  in  both. 

We  rewrite  (2. 1)  for  one  cell  as 


In 


(w„  *zc  -  1)! 
(*e-D! 


(6.1) 


where  we  write  nc  and  zc  for  n ,  and  z,  respectively,  in 
(2, 1),  the  subscript  c  indicates  that  these  are  quanti¬ 
ties  for  a  cell.  The  ensemble  expression  (5. 13)  is 
written  as  it  is : 


S  =  Af((n  +  z)ln(n  +  z)  -  n  Inn  -  z  lnz]  .  (6.2) 


In  comparing  the  two  expressions,  is  the  number  of 
photons  detected  from  the  cell  of  the  area  «■*  within  a 
certain  specified  detection  volume  V-ctA  [Eq.  (3.4)]. 
The  quantity  zc  is  the  number  of  degrees  of  freedom  as¬ 
sociated  with  a  photon  coming  from  that  cell.  There  is 
no  restriction  on  the  values  of  nc  and  zc,  except  that 
they  be  non-negative  integers.  It  is  meaningful  to  ask 
about  the  relationship  between  (6. 1)  and  (6.  2).  For 
this  purpose,  let  the  detection  volume  V'  used  in  defining 
zc  in  (6. 1)  be  equal  to  M  times  the  corresponding  V 
used  in  (6.2).  This  introduces  the  correspondence 

nc  =  Mn  and  zc  =  Mz  .  (6.3) 

Substituting  (6.3)  into  (6. 1)  brings  sc  exactly  into  the 
form  (6.2)  for  S  when  we  use  the  approximation 


Mz  »1 


(6.4) 


together  with  Stirling's  approximation  (which  is  justified 
when  M  is  very  large).  It  is  to  be  noted  particularly 
that  the  limit  of  zc  »  1  in  (6. 1)  agrees  with  the  case  of 
z  =  1  in  (6.  2). 


VU.  FERMI  STATISTICS 

The  formulation  in  this  paper  has  been  based  on  pho¬ 
tons  and  the  Bose-Einstein  statistics.  It  is  natural  to 
ask  about  the  case  of  the  Fermi-Dirac  statistics  because 
an  image-restoration  problem  of  a  similar  kind  exists 
there  (for  example,  in  the  field  of  electron  microscopy). 

For  the  Fermi  case,  we  need  modify  Sec.  II  only 
slightly.  We  again  divide  the  object  space  into  cells. 

The  number  of  electrons  which  have  been  emitted  from 
the  fth  cell  withinthe  observation  time  I  is  written  as  n,. 
The  number  of  degrees  of  freedom  corresponding  to  the 
cell  is  written  as  z.  The  number  of  ways  q,  that  the  n, 
electrons  can  be  distributed  over  the  z  degrees  of  free¬ 
dom  is  written  by  the  binomial  expansion  coefficient 
(Fermi-Dirac  statistics) 

q i  =  z!  /(z  -  n,)!  »,!  ,  (7.  la) 

which  replaces  Eq.  (2. 1).  We  can  define  the  entropy 
for  the  cell  as  we  did  in  (2.3): 

s;  =  Inq,  .  (7.  lb) 

As  in  Sec.  n,  casefAj,  z  =  l,  yields  ',  =  0.  Different 
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from  previous  sections,  the  restriction  in  the  Fermi 
case  is 

(7.2) 

because  of  exclusion  principle,  and  hence  case  [ B ]  in 
Sec.  II  is  of  no  concern.  Corresponding  to  case  [C|, 
z  ■>>!,,  the  entropy  expression  for  the  entire  object  field 
becomes 

s*  £s,  =  nln(z/n)-  «  ^pydnp,- •  (7.3) 

i  i 

where  we  define,  as  in  (2.8), 


p^rtj/n  .  (7.4b) 

Since  case  [C)  is  the  classical  particle  limit,  it  is  natur¬ 
al  that  the  entropy  expression  (7.3)  is  exactly  the  same 
as  that  derived  for  the  classical  particle  limit  of  the 
Bose  case  (2. 12).  Hie  ensemble  formulation  can  be 
done  as  in  Sec.  V  by  replacing  q(n)  in  (5.3)  with  the  bi¬ 
nomial  expression  corresponding  to  (7.  la).  Again  for 
case  [C],  in  which  z  »n,  we  arrive  at  the  classical  par¬ 
ticle  expression  (5.22). 

Vin.  DISCUSSION 

In  clarifying  the  concept  of  entropy  discussed  in  pre¬ 
vious  sections,  it  is  important  to  comment  on  the  entropy 
as  used  in  a  class  of  problems  that  is  similar  to,  but 
distinctly  different  from,  ours.  In  statistical  mechanics 
as  well  as  in  other  fields,  the  entropy  can  always  be 
written  Ini',  where  T  is  the  number  of  microscopically 
distinguishable  different  ways  in  which  the  macroscopic 
configuration  of  the  system  can  be  arranged  under  given 
constraints  defined  in  the  problem,  whatever  the  prob¬ 
lem  may  be.  This  concept  of  entropy  Ini'  is  valid  for  a 
state  that  has  fluctuated  away  from  the  most  probable 
one  as  well  as  for  the  most  probable  one ,  and  is  a  gen¬ 
eralization  of  thermodynamic  entropy  that  is  defined  only 
for  the  equilibrium  state. 

When  the  problems  are  different,  the  entropies  are 
different.  Sometimes  confusions  occur,  however,  when 
expressions  of  entropy  which  resemble  each  other  are 
used  in  different  problems.  We  will  discuss  an  example 
that  has  caused  confusion  in  understanding  the  maximum 
entropy  image-restoration  problem.  In  treating  speckle 
patterns  formed  by  lasers,  Dainty*’  introduces  the  prob¬ 
ability  density  function  p(l)  of  light  intensity  I,  indepen¬ 
dent  of  where  in  the  pattern  1  exists,  and  writes  the  en¬ 
tropy  as 

S  =  - f  p(I)\np(I)dl  .  (8.1) 

This  is  the  legitimate  entropy  function  for  his  problem 
when  no  contribution  from  the  a  priori  probability  is  as¬ 
sumed.  But  the  question  he  is  trying  to  answer  is  dif¬ 
ferent  from  ours  and  hence  the  physical  content  of  the 
entropy  expression  (8. 1)  is  different  from  ours. 

In  the  speckle  pattern  statistics,  in  deriving  (8. 1)  one 
calculates  the  number  of  ways  T0  that  all  the  different 
spatial  patterns  can  be  formed  consistent  with  the  un- 
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known  intensity  probability  distribution  \p(I)\  to  be  de¬ 
termined  and  writes  the  entropy  lnr0.  This  formulation 
of  the  problem  does  not  lead  to  a  unique  two-dimensional 
pattern  corresponding  to  a  maximum  of  the  entropy. 

Many  patterns  can  be  found,  each  of  which  satisfies  the 
calculated  distribution  function  corresponding  to  the 
maximum  entropy. 

In  the  problem  of  maximum  entropy  image  restora¬ 
tion,  on  the  other  hand,  the  value  of  the  intensity  of  each 
cell  over  a  two-dimensional  space  is  given  an  assumed 
fixed  trial  value.  We  ask  in  how  many  ways  F,  the  in¬ 
tensity  at  the  cell  t  can  be  formed  by  taking  into  account 
the  a  priori  degeneracy  associated  with  the  intensity  at 
the  point  and  any  ensemble  contribution  if  appropriate. 
The  entropy  is  then  derived  as  Ini',  for  the  cell  i  and 
then  summed  over  all  cell  positions  over  the  two-dimen¬ 
sional  space.  This  is  what  we  explained  in  Secs.  II  and 
V.  The  entropy  is  then  maximized  and  a  unique  two- 
dimensional  pattern  is  derived  corresponding  to  it. 

The  basic  difference  between  the  speckle  pattern  sta¬ 
tistics  problem  and  the  maximum  entropy  image  prob¬ 
lem  is,  therefore,  that  in  the  latter  a  unique  pattern  is 
calculated  corresponding  to  the  maximum  entropy, 
whereas  in  the  former  a  group  of  many  patterns  is  derived 
from  the  intensity  distribution  corresponding  to  the  maxi¬ 
mum  entropy  of  that  problem.  This  difference  leads  to  an¬ 
other  important  consequence  with  regard  to  the  "smooth¬ 
ness"  and  the  "disorder”  of  the  pattern  when  the  entropy 
is  maximized.  Suppose  one  asks  for  the  pattern  corre¬ 
sponding  to  the  maximum  of  the  entropy  in  the  speckle 
statistics  example.  When  (8. 1)  is  maximized  with  only 
the  constraint  that  the  average  (  /)  of  /  is  given  (together 
with  the  normalization  constraint),  thenthe  result  is  that 
p(/)  =  (/>‘1ex p(—  //< /)).  This  means  the  local  intensity 
(/)  can  take  various  different  values  and  hence  the  actual 
patterns  are  not  "smooth."  On  the  other  hand,  when  the 
entropy  of  the  maximum  entropy  image-restoration 
problem  is  maximized  with  only  the  constraint  that  the 
average  intensity  is  given  (/>,’ s  are  normalized),  the 
local  intensities  pt  (i  indicating  the  ith  cell  location)  are 
constant  and  independent  of  i,  resulting  in  a  “smooth” 
flat  pattern  with  the  same  intensity  everywhere.  Such 
a  flat  pattern  in  the  speckle  statistics  would  be  expressed 
p(/)  =  5(/-/0)  rather  than  the  negative  exponential  distri¬ 
bution.  This  is  only  an  illustration  of  the  difference;  it 
is  not  meant  to  imply  that  the  one  method  always  gives 
"smooth”  and  the  other  method  "not  smooth"  results. 
Different  constraints  could  reverse  the  situation.  We 
avoid  discussing  the  poorly  defined  concept  of  "disorder" 
in  the  context  of  two-dimensional  patterns.  Disorder 
should  not  be  confused  with  the  degree  of  smoothness, 
as  they  are  independent  concepts. 

The  two  problems  compared  in  this  section  use  the 
same  basic  concept  of  the  entropy  Ini',  but  the  way  the 
maximum  entropy  behaves  seems  quite  different;  this 
is  only  because  the  problems  are  different. 

IX.  SUMMARY  AND  CONCLUSION 

In  the  maximum  entropy  (ME)  image-restoration  for¬ 
mulation,  there  are  two  different  expressions  for  entropy 
now  in  use;  for  short,  we  call  them  the  log B  and  the 
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-  BlogB  expressions  (B  being  the  local  brightness  of 
the  object,  or  the  special  spectral  energy).  In  the 
present  paper,  we  develop  a  general  form  for  entropy 
that  should  be  used  in  the  ME  restoration  method,  and 
classify  the  conditions  under  which  the  above  two  ex¬ 
pressions  are  valid. 


We  use  the  original  definition  of  entropy  by  Boltzmann 
and  Planck,  and  find  the  most  probable  object  pattern 
(restricted  by  the  measured  image).  In  other  words,  the 
entropy  to  be  maximized  is  the  logarithm  of  the  proba¬ 
bility  that  an  object  pattern  occurs. 


The  object  pattern  is  defined  by  the  distribution  of  the 
number  of  photons  n,  over  the  cells  i  =  1,  2, .  .  .  in  the 
object  space.  The  probability  [<?,(n()]  mentioned  in  the 
preceding  paragraph  is  discussed  in  Sec.  □,  and  is 
based  on  the  Bose -Einstein  statistics,  which  photons 
obey.  It  is  proportional  to  the  number  of  ways  the  n, 
photons  (in  the  ith  object  cell)  can  be  distributed  over  2 
degrees  of  freedom : 


(n,»z-l)! 


Section  III  discusses  in  detail  how  2  can  be  estimated. 
Based  on  this  expression  for  q,(.nt),  Sec.  D  interprets 
the  two  entropy  expressions  as  the  two  limiting  cases : 
[B 1  when  n,  >>2  >  1,  log<j((n()  leads  to  the  logB  expres¬ 
sion  for  the  entropy,  and  [C]  when  z  »n(,  it  leads  to  the 
-BlogB  form  of  entropy. 


The  relation  between  the  entropy  s,  -  log<?((n,)  and  the 
entropy  familiar  in  information  theory  is  discussed  in 
Sec.  V,  using  the  ensemble  for  which  nt,  an  average 
over  the  ensemble,  is  specified  rather  than  n,.  With  n 
sjiecified,  the  two  distinctions  [  B  ]  and  [C|  above  remain 
the  same.  The  third  case,  [A],  when  n,  »z  =  1  (the  cur¬ 
rent  applications  of  the  ME  method  for  power  spectral 
estimation  belong  to  this  limit)  leads  to  the  log-B  expres¬ 
sion  when  the  ensemble  formulation  is  used. 
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Comments  on  “Spectral  Estimation:  An  Impossibility?" 

B  H  SOI TER  4 mi  RYOICHI  K1KUCH1 

Abstract  -Spectral  estimation  is  possible  from  a  finite  portion  of  the 
autocorrelation  function.  The  ambiguity  of  a  multiplicity  of  permissible 
spectra  all  consistent  with  the  data  is  the  very  reason  why  it  is  possible 
because  each  possible  spectrum  has  a  different  probability.  Maximum 
entropy  estimation  (MEE)  picks  the  most  probable  one  as  the  estimate. 

In  a  recent  letter  (1  j.  Nitzberg  questioned  the  possibility  of  making 
spectral  estimates  using  only  a  portion  of  the  autocorrelation  function 
of  the  data  because  of  the  ambiguity  arising  from  the  multiplicity  of 
possible  spectra  that  are  consistent  with  the  data.  Restricting  the  class 
of  spectra  to  a  particular  class  of  network  models  as  Nitzberg  would 
suggest,  or  equivalently  determining  the  appropriate  underlying  statistics 
(e.g.,  the  all-pole  network  model  in  estimation  theory  or  equivalently 
its  underlying  Gaussian  statistics  in  complex  amplitudes)  [2]  still  would 
leave  the  ambiguity  of  a  multiplicity  of  possible  spectra  all  compatible 
with  the  given  data.  The  multiplicity,  however,  is  the  very  reason  that 
we  can  employ  the  tools  of  estimation  theory.  Estimation  theory  is 
probabilistic  and  provides  both  an  estimate  and  a  degree  of  confidence 
in  the  estimate. 

MEE,  for  example,  chooses  the  most  probable  member  of  this  set  of 
possibilities  for  the  estimate  (2) .  Entropy  is  defined  in  this  context  as 
the  logarithm  of  the  probability  that  a  spectrum  occurs.  The  details  of 
this  interpretation,  which  makes  explicit  the  probabilistic  foundation 
of  MEE,  can  be  found  in  Reference  (2) . 


Manscript  received  March  23,  1979.  This  work  was  supported  by  the 
Air  Force  Office  of  Scientific  Research  under  Contract  F49620-77-C- 
0052. 
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Reply 1  by  R.  Mitzberg2 

Softer  And  Kikucki  state  that  spectral  estimation  is  possible.  They 
recommend  the  MEM  and  justify  the  recommendation  by  stating  the 
well-known  property  that  the  technique  maximizes  a  particular  integral 
involving  the  power  spectrum  (entropy)  under  the  constraint  that  cer¬ 
tain  other  integrals  involving  the  power  spectrum  are  specified  values 
(the  known  sampled  autocorrelation  values).  From  the  viewpoint  of 
modern  statistical  estimation  theory,  this  particular  estimator  is  one  of 
a  nondenumerable  set  of  possible  estimators.  Thus  is  should  be  clear 
that  the  question  of  whether  or  not  it  is  possible  to  estimate  spectra 
cannot  refer  to  whether  an  estimator  exists  but  whether  it  can  be 
shown  that  a  particular  one  has  some  desirable  property. 

The  usual  estimation  theory  criterion  for  choosing  a  particular  es¬ 
timator  is  small  error.  Often,  it  is  not  known  how  to  best  select  one 
estimator  from  the  set  of  all  possible,  and,  in  order  to  simplify  the 
problem,  the  set  of  estimators  that  is  being  considered  is  restricted.  In 
one  sense,  the  restriction  is  arbitrary  but  in  another  it  is  not.  The 
restriction  is  made  to  a  set  with  tractable  mathematical  properties. 
Thus  sometimes  estimators  are  constrained  to  be  linear  functions  of 
the  data  for  no  reason  other  than  nonlinear  functions  are  hard  to 
analyze  However,  there  are  many  well-known  cases  where  the  best 
linear  estimators  have  substantially  larger  error  than  the  best  estimator. 

This  philosophy  of  estimation  theory  relates  to  MEM  spectral  estima¬ 
tion  It  is  agreed  that  MEM  is  an  extremely  valuable  technique.  Ex¬ 
amples  abound  of  the  superiority  of  this  technique  compared  to  more 
conventional  spectral  estimation  techniques.  However,  as  with  esti¬ 
mators,  does  the  MEM  procedure  always  have  small  error9  As  shown 
by  Gulowski  et  a!  [3|.  there  are  examples  where  MEM  gives  very  bad 
results  A  question  of  interest  is,  can  it  be  predicted  when  (or  if)  MEM 
is  applicable9 

A  main  concern  when  using  MEM  should  be  that  tne  concept  of 
maximizing  entropy  is  extremely  nebulous  and  it  may  not  be  a  reason¬ 
able  criterion  Spectral  estimation  by  MEM  is  a  restriction  to  a  class  of 
estimator*  with  tractable  mathematical  properties  just  as  is  the  afore¬ 
mentioned  restriction  to  linear  estimators.  This  restriction  is  clear 
when  the  MEM  procedure  (as  shown  by  Vanden  Bos  (4{)  is  recast  into 
the  all-pole  network  algorithm.  He  shows  that  MEM  is  equivalent  to 
computing  the  M  coefficients  of  an  all-pole  network  that  fils  the  known 
M  values  of  the  autocorrelation  function.  If  the  assumption  of  an  M 
pole  network  fits  the  problem  at  hand,  this  is  a  reasonable  procedure. 
If  the  assumption  is  totally  unreasonable  it  should  be  discarded.  It  it 
is  not  known  whether  or  not  the  assumption  is  reasonable,  one  should 
be  concerned  and  not  adament  that  this  is  the  “best’*  procedure.  As 
one  example  of  this  flexibility,  even  when  a  data  stream  of  M  points  is 
obtained,  so  that  an  M  point  autocorrelation  function  sequence  can  be 
estimated,  the  number  of  feedback  coefficients  of  the  all-pole  network 
fthe  number  of  autocorrelation  values  estimated)  is  often  taken  as 
substantially  less  than  M  In  practice,  there  is  not  a  unique  MEM 
estimated  spectrum  but  many  depending  upon  how  many  poles  are 
used  in  the  estimating  network.  Other  criteria  are  then  imposed  to 
choose  the  preferred  MEM  spectrum  f 5 1 . 

To  summarize  the  above,  there  are  a  nondenumerable  number  of 
spectral  estimators.  Some  of  these  can  be  phased  in  terms  of  networks 
with  poles  and  zeroes.  The  restriction  to  estimation  using  the  all-pole 
network  (equivalent  to  maximizing  entropy)  is  made  primarily  to 
simplify  the  mathematics.  This  is  an  extremely  valuable  property  The 
aspect  of  simplification  is  emphasized  not  to  denigrate  the  technique. 


1  Manuscript  received  June  25,  1979. 
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but  to  clarify  under  whai  conditions  is  should  be  expected  that  the 

procedure  will  work  well  It  is  known  that  it,  in  truth,  the  network 

generating  the  data  is  an  all-zeru  network,  MEM  works  less  well  than 
other  less  easily  computed  procedures  It  enough  about  the  true 
state  of  nature  is  known  so  that  the  physics  of  the  problem  predicts 
troubles  for  MEM,  do  not  use  it  It  the  physics  ot  the  problem  is 
completely  unknown,  spectra]  estimation  is  impossible  in  the  sense  ot 
guaranteeing  good  estimation  of  the  spectral  density 
It  is  historically  obvious  tha1  even  lui  this  latter  conditon  ut  lack  ot 
knowledge,  the  measured  data  does  give  considerable  information  about 
the  spectrum  This  is  because  when  a  truncated  autocorrelation  turn, 
tion  is  available,  this  can  allow  exact  evaluation  of  some  integral'  ■  t 

the  spectrum  Note  that  the  output  a  spectrum  analyzer  is  given  by 

integrals  of  this  form  Specifically,  when  a  random  process  is  passed 
through  a  filter,  the  output  power  P0  is  given  by 

l\,  =  J  Sxif)  H(f)  2  df  1 1 ) 

where  Sx{f)  is  the  input  process  and  Hif)  is  the  transfer  function  ot 
the  filter  By  Farcevafs  theorem,  it  is  jlso  given  by 

P0  -  f  Rxlr)Rn{r)dT  i2) 

where  Rx(t)  and  rtn(r)  are  the  autocorrelation  functions  ot  the  pro¬ 
cess  and  filter,  respectively.  It  the  filter  impulse  response  is  of  finite 
duration  7/.,  then  Rn(r )  is  of  duration  2 Tp.  The  situation  stated  is 
that  Rx(t)  is  known  exactly  for  , rl  <  T\  When  27/.-  is  less  than  T\. 
the  integral  in  (2)  can  be  evaluated  exactly.  For  a  spectrum  analyzci 
application,  this  is  approximately  equivalent  to  the  statement  th-1*' 
when  the  process  is  passed  through  a  narrow-band  filter  tuned  to  at. 
center  frequency,  the  filter’s  power  output  can  be  determined  as  long 
as  the  reciprocal  of  the  filter  bandwidth  is  larger  than  T ,  Thus 
though  the  detailed  structure  of  the  power  spectrum  cannot,  in 
general,  be  determined  on  the  basis  of  the  truncated  autocorrelation 
function,  the  output  of  a  spectrum  analyzer  can  be  computed  when 
the  spectrum  analyzer’s  frequency  resolution  is  not  "excessive.'' 

Further  Comments  by  B.  H.  Suffer  and  R.  Kikuchi 
We  agree  that  it  would  be  unreasonable  to  use  the  MEM  when  the 
physics  of  the  problem  is  not  known.  When  known,  the  concept  of  the 
MEM  is  not  nebulous.  The  MEM  is  not  limited  to  the  form  introduced 
bv  Burg  based  on  Gaussian  statistics,  other  forms  of  entropy  may  lx1 
dictated  by  the  problem  at  hand  [2] 
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